From nobody Sun Sep 14 05:14:47 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 6DD902EBDEB; Mon, 8 Sep 2025 07:40:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757317241; cv=none; b=Bh4DtBkSyt8Ya89sYFb4ckmKO3ws38x4+df7qwzGuNGi/IGiX0rvJOAAXa2vNcEhEBE3GKuacs4JGOckR4PmsSABXQD68uSAC0dWIbQZ4Kw+CeFZES9SoMiVia6dBq2y5Bq/pVmSAd9v0nSgckxTCYkwzILNL2gJisbo0blRf3k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757317241; c=relaxed/simple; bh=qZtUGOc88EFIX1Nxbmmv2gh0LedjljhucL04pcHfqes=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=bT7aETlMvGltcmUHWQkGDWGFqK2CTdPIKWBL/BFfgarFVWRD3Oi2EVJn3yLZq02BzPTxZ8XznSekae7wgj8IycG8hJAzYV7mszRfIS26AadE2a6hOPFlRSXq18XG06pEwqA0jvZ7FgrcdoJG4Ycuy6IvliBipGD4vlvdpOhj8IU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 751C0169C; Mon, 8 Sep 2025 00:40:30 -0700 (PDT) Received: from e123572-lin.arm.com (e123572-lin.cambridge.arm.com [10.1.194.54]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 20C3A3F63F; Mon, 8 Sep 2025 00:40:34 -0700 (PDT) From: Kevin Brodsky To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Kevin Brodsky , Alexander Gordeev , Andreas Larsson , Andrew Morton , Boris Ostrovsky , Borislav Petkov , Catalin Marinas , Christophe Leroy , Dave Hansen , David Hildenbrand , "David S. Miller" , "H. Peter Anvin" , Ingo Molnar , Jann Horn , Juergen Gross , "Liam R. Howlett" , Lorenzo Stoakes , Madhavan Srinivasan , Michael Ellerman , Michal Hocko , Mike Rapoport , Nicholas Piggin , Peter Zijlstra , Ryan Roberts , Suren Baghdasaryan , Thomas Gleixner , Vlastimil Babka , Will Deacon , Yeoreum Yun , linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, sparclinux@vger.kernel.org, xen-devel@lists.xenproject.org Subject: [PATCH v2 1/7] mm: remove arch_flush_lazy_mmu_mode() Date: Mon, 8 Sep 2025 08:39:25 +0100 Message-ID: <20250908073931.4159362-2-kevin.brodsky@arm.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250908073931.4159362-1-kevin.brodsky@arm.com> References: <20250908073931.4159362-1-kevin.brodsky@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This function has only ever been used in arch/x86, so there is no need for other architectures to implement it. Remove it from linux/pgtable.h and all architectures besides x86. The arm64 implementation is not empty but it is only called from arch_leave_lazy_mmu_mode(), so we can simply fold it there. Acked-by: Mike Rapoport (Microsoft) Signed-off-by: Kevin Brodsky Acked-by: David Hildenbrand Reviewed-by: Yeoreum Yun --- arch/arm64/include/asm/pgtable.h | 9 +-------- arch/powerpc/include/asm/book3s/64/tlbflush-hash.h | 2 -- arch/sparc/include/asm/tlbflush_64.h | 1 - arch/x86/include/asm/pgtable.h | 3 ++- include/linux/pgtable.h | 1 - 5 files changed, 3 insertions(+), 13 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgta= ble.h index abd2dee416b3..728d7b6ed20a 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -101,21 +101,14 @@ static inline void arch_enter_lazy_mmu_mode(void) set_thread_flag(TIF_LAZY_MMU); } =20 -static inline void arch_flush_lazy_mmu_mode(void) +static inline void arch_leave_lazy_mmu_mode(void) { if (in_interrupt()) return; =20 if (test_and_clear_thread_flag(TIF_LAZY_MMU_PENDING)) emit_pte_barriers(); -} - -static inline void arch_leave_lazy_mmu_mode(void) -{ - if (in_interrupt()) - return; =20 - arch_flush_lazy_mmu_mode(); clear_thread_flag(TIF_LAZY_MMU); } =20 diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h b/arch/powe= rpc/include/asm/book3s/64/tlbflush-hash.h index 146287d9580f..176d7fd79eeb 100644 --- a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h +++ b/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h @@ -55,8 +55,6 @@ static inline void arch_leave_lazy_mmu_mode(void) preempt_enable(); } =20 -#define arch_flush_lazy_mmu_mode() do {} while (0) - extern void hash__tlbiel_all(unsigned int action); =20 extern void flush_hash_page(unsigned long vpn, real_pte_t pte, int psize, diff --git a/arch/sparc/include/asm/tlbflush_64.h b/arch/sparc/include/asm/= tlbflush_64.h index 8b8cdaa69272..cd144eb31bdd 100644 --- a/arch/sparc/include/asm/tlbflush_64.h +++ b/arch/sparc/include/asm/tlbflush_64.h @@ -44,7 +44,6 @@ void flush_tlb_kernel_range(unsigned long start, unsigned= long end); void flush_tlb_pending(void); void arch_enter_lazy_mmu_mode(void); void arch_leave_lazy_mmu_mode(void); -#define arch_flush_lazy_mmu_mode() do {} while (0) =20 /* Local cpu only. */ void __flush_tlb_all(void); diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index e33df3da6980..14fd672bc9b2 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -117,7 +117,8 @@ extern pmdval_t early_pmd_flags; #define pte_val(x) native_pte_val(x) #define __pte(x) native_make_pte(x) =20 -#define arch_end_context_switch(prev) do {} while(0) +#define arch_end_context_switch(prev) do {} while (0) +#define arch_flush_lazy_mmu_mode() do {} while (0) #endif /* CONFIG_PARAVIRT_XXL */ =20 static inline pmd_t pmd_set_flags(pmd_t pmd, pmdval_t set) diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 94249e671a7e..8d6007123cdf 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -234,7 +234,6 @@ static inline int pmd_dirty(pmd_t pmd) #ifndef __HAVE_ARCH_ENTER_LAZY_MMU_MODE #define arch_enter_lazy_mmu_mode() do {} while (0) #define arch_leave_lazy_mmu_mode() do {} while (0) -#define arch_flush_lazy_mmu_mode() do {} while (0) #endif =20 #ifndef pte_batch_hint --=20 2.47.0 From nobody Sun Sep 14 05:14:47 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=fail(p=none dis=none) header.from=arm.com Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1757317267951882.3969608597225; Mon, 8 Sep 2025 00:41:07 -0700 (PDT) Received: from list by lists.xenproject.org with outflank-mailman.1114635.1461483 (Exim 4.92) (envelope-from ) id 1uvWUa-000545-Fp; Mon, 08 Sep 2025 07:40:48 +0000 Received: by outflank-mailman (output) from mailman id 1114635.1461483; Mon, 08 Sep 2025 07:40:48 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1uvWUa-00053w-As; Mon, 08 Sep 2025 07:40:48 +0000 Received: by outflank-mailman (input) for mailman id 1114635; Mon, 08 Sep 2025 07:40:47 +0000 Received: from se1-gles-flk1-in.inumbo.com ([94.247.172.50] helo=se1-gles-flk1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1uvWUZ-0004k6-4k for xen-devel@lists.xenproject.org; Mon, 08 Sep 2025 07:40:47 +0000 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by se1-gles-flk1.inumbo.com (Halon) with ESMTP id 20afcf27-8c87-11f0-9809-7dc792cee155; Mon, 08 Sep 2025 09:40:44 +0200 (CEST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B33CE1762; Mon, 8 Sep 2025 00:40:35 -0700 (PDT) Received: from e123572-lin.arm.com (e123572-lin.cambridge.arm.com [10.1.194.54]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 1EDB53F63F; Mon, 8 Sep 2025 00:40:38 -0700 (PDT) X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 20afcf27-8c87-11f0-9809-7dc792cee155 From: Kevin Brodsky To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Kevin Brodsky , Alexander Gordeev , Andreas Larsson , Andrew Morton , Boris Ostrovsky , Borislav Petkov , Catalin Marinas , Christophe Leroy , Dave Hansen , David Hildenbrand , "David S. Miller" , "H. Peter Anvin" , Ingo Molnar , Jann Horn , Juergen Gross , "Liam R. Howlett" , Lorenzo Stoakes , Madhavan Srinivasan , Michael Ellerman , Michal Hocko , Mike Rapoport , Nicholas Piggin , Peter Zijlstra , Ryan Roberts , Suren Baghdasaryan , Thomas Gleixner , Vlastimil Babka , Will Deacon , Yeoreum Yun , linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, sparclinux@vger.kernel.org, xen-devel@lists.xenproject.org Subject: [PATCH v2 2/7] mm: introduce local state for lazy_mmu sections Date: Mon, 8 Sep 2025 08:39:26 +0100 Message-ID: <20250908073931.4159362-3-kevin.brodsky@arm.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250908073931.4159362-1-kevin.brodsky@arm.com> References: <20250908073931.4159362-1-kevin.brodsky@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ZM-MESSAGEID: 1757317271291124100 Content-Type: text/plain; charset="utf-8" arch_{enter,leave}_lazy_mmu_mode() currently have a stateless API (taking and returning no value). This is proving problematic in situations where leave() needs to restore some context back to its original state (before enter() was called). In particular, this makes it difficult to support the nesting of lazy_mmu sections - leave() does not know whether the matching enter() call occurred while lazy_mmu was already enabled, and whether to disable it or not. This patch gives all architectures the chance to store local state while inside a lazy_mmu section by making enter() return some value, storing it in a local variable, and having leave() take that value. That value is typed lazy_mmu_state_t - each architecture defining __HAVE_ARCH_ENTER_LAZY_MMU_MODE is free to define it as it sees fit. For now we define it as int everywhere, which is sufficient to support nesting. The diff is unfortunately rather large as all the API changes need to be done atomically. Main parts: * Changing the prototypes of arch_{enter,leave}_lazy_mmu_mode() in generic and arch code, and introducing lazy_mmu_state_t. * Introducing LAZY_MMU_{DEFAULT,NESTED} for future support of nesting. enter() always returns LAZY_MMU_DEFAULT for now. (linux/mm_types.h is not the most natural location for defining those constants, but there is no other obvious header that is accessible where arch's implement the helpers.) * Changing all lazy_mmu sections to introduce a lazy_mmu_state local variable, having enter() set it and leave() take it. Most of these changes were generated using the following Coccinelle script: @@ @@ { + lazy_mmu_state_t lazy_mmu_state; ... - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); ... - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); ... } * In a few cases (e.g. xen_flush_lazy_mmu()), a function knows that lazy_mmu is already enabled, and it temporarily disables it by calling leave() and then enter() again. Here we want to ensure that any operation between the leave() and enter() calls is completed immediately; for that reason we pass LAZY_MMU_DEFAULT to leave() to fully disable lazy_mmu. enter() will then re-enable it - this achieves the expected behaviour, whether nesting occurred before that function was called or not. Note: it is difficult to provide a default definition of lazy_mmu_state_t for architectures implementing lazy_mmu, because that definition would need to be available in arch/x86/include/asm/paravirt_types.h and adding a new generic #include there is very tricky due to the existing header soup. Acked-by: Mike Rapoport (Microsoft) Signed-off-by: Kevin Brodsky Reviewed-by: Juergen Gross # arch/x86 Reviewed-by: Yeoreum Yun --- arch/arm64/include/asm/pgtable.h | 10 +++++++--- .../include/asm/book3s/64/tlbflush-hash.h | 9 ++++++--- arch/powerpc/mm/book3s64/hash_tlb.c | 10 ++++++---- arch/powerpc/mm/book3s64/subpage_prot.c | 5 +++-- arch/sparc/include/asm/tlbflush_64.h | 5 +++-- arch/sparc/mm/tlb.c | 6 ++++-- arch/x86/include/asm/paravirt.h | 6 ++++-- arch/x86/include/asm/paravirt_types.h | 2 ++ arch/x86/xen/enlighten_pv.c | 2 +- arch/x86/xen/mmu_pv.c | 2 +- fs/proc/task_mmu.c | 5 +++-- include/linux/mm_types.h | 3 +++ include/linux/pgtable.h | 6 ++++-- mm/kasan/shadow.c | 4 ++-- mm/madvise.c | 20 ++++++++++--------- mm/memory.c | 20 +++++++++++-------- mm/migrate_device.c | 5 +++-- mm/mprotect.c | 5 +++-- mm/mremap.c | 5 +++-- mm/userfaultfd.c | 5 +++-- mm/vmalloc.c | 15 ++++++++------ mm/vmscan.c | 15 ++++++++------ 22 files changed, 102 insertions(+), 63 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgta= ble.h index 728d7b6ed20a..816197d08165 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -81,7 +81,9 @@ static inline void queue_pte_barriers(void) } =20 #define __HAVE_ARCH_ENTER_LAZY_MMU_MODE -static inline void arch_enter_lazy_mmu_mode(void) +typedef int lazy_mmu_state_t; + +static inline lazy_mmu_state_t arch_enter_lazy_mmu_mode(void) { /* * lazy_mmu_mode is not supposed to permit nesting. But in practice this @@ -96,12 +98,14 @@ static inline void arch_enter_lazy_mmu_mode(void) */ =20 if (in_interrupt()) - return; + return LAZY_MMU_DEFAULT; =20 set_thread_flag(TIF_LAZY_MMU); + + return LAZY_MMU_DEFAULT; } =20 -static inline void arch_leave_lazy_mmu_mode(void) +static inline void arch_leave_lazy_mmu_mode(lazy_mmu_state_t state) { if (in_interrupt()) return; diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h b/arch/powe= rpc/include/asm/book3s/64/tlbflush-hash.h index 176d7fd79eeb..c9f1e819e567 100644 --- a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h +++ b/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h @@ -25,13 +25,14 @@ DECLARE_PER_CPU(struct ppc64_tlb_batch, ppc64_tlb_batch= ); extern void __flush_tlb_pending(struct ppc64_tlb_batch *batch); =20 #define __HAVE_ARCH_ENTER_LAZY_MMU_MODE +typedef int lazy_mmu_state_t; =20 -static inline void arch_enter_lazy_mmu_mode(void) +static inline lazy_mmu_state_t arch_enter_lazy_mmu_mode(void) { struct ppc64_tlb_batch *batch; =20 if (radix_enabled()) - return; + return LAZY_MMU_DEFAULT; /* * apply_to_page_range can call us this preempt enabled when * operating on kernel page tables. @@ -39,9 +40,11 @@ static inline void arch_enter_lazy_mmu_mode(void) preempt_disable(); batch =3D this_cpu_ptr(&ppc64_tlb_batch); batch->active =3D 1; + + return LAZY_MMU_DEFAULT; } =20 -static inline void arch_leave_lazy_mmu_mode(void) +static inline void arch_leave_lazy_mmu_mode(lazy_mmu_state_t state) { struct ppc64_tlb_batch *batch; =20 diff --git a/arch/powerpc/mm/book3s64/hash_tlb.c b/arch/powerpc/mm/book3s64= /hash_tlb.c index 21fcad97ae80..ee664f88e679 100644 --- a/arch/powerpc/mm/book3s64/hash_tlb.c +++ b/arch/powerpc/mm/book3s64/hash_tlb.c @@ -189,6 +189,7 @@ void hash__tlb_flush(struct mmu_gather *tlb) */ void __flush_hash_table_range(unsigned long start, unsigned long end) { + lazy_mmu_state_t lazy_mmu_state; int hugepage_shift; unsigned long flags; =20 @@ -205,7 +206,7 @@ void __flush_hash_table_range(unsigned long start, unsi= gned long end) * way to do things but is fine for our needs here. */ local_irq_save(flags); - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); for (; start < end; start +=3D PAGE_SIZE) { pte_t *ptep =3D find_init_mm_pte(start, &hugepage_shift); unsigned long pte; @@ -217,12 +218,13 @@ void __flush_hash_table_range(unsigned long start, un= signed long end) continue; hpte_need_flush(&init_mm, start, ptep, pte, hugepage_shift); } - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); local_irq_restore(flags); } =20 void flush_hash_table_pmd_range(struct mm_struct *mm, pmd_t *pmd, unsigned= long addr) { + lazy_mmu_state_t lazy_mmu_state; pte_t *pte; pte_t *start_pte; unsigned long flags; @@ -237,7 +239,7 @@ void flush_hash_table_pmd_range(struct mm_struct *mm, p= md_t *pmd, unsigned long * way to do things but is fine for our needs here. */ local_irq_save(flags); - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); start_pte =3D pte_offset_map(pmd, addr); if (!start_pte) goto out; @@ -249,6 +251,6 @@ void flush_hash_table_pmd_range(struct mm_struct *mm, p= md_t *pmd, unsigned long } pte_unmap(start_pte); out: - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); local_irq_restore(flags); } diff --git a/arch/powerpc/mm/book3s64/subpage_prot.c b/arch/powerpc/mm/book= 3s64/subpage_prot.c index ec98e526167e..4720f9f321af 100644 --- a/arch/powerpc/mm/book3s64/subpage_prot.c +++ b/arch/powerpc/mm/book3s64/subpage_prot.c @@ -53,6 +53,7 @@ void subpage_prot_free(struct mm_struct *mm) static void hpte_flush_range(struct mm_struct *mm, unsigned long addr, int npages) { + lazy_mmu_state_t lazy_mmu_state; pgd_t *pgd; p4d_t *p4d; pud_t *pud; @@ -73,13 +74,13 @@ static void hpte_flush_range(struct mm_struct *mm, unsi= gned long addr, pte =3D pte_offset_map_lock(mm, pmd, addr, &ptl); if (!pte) return; - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); for (; npages > 0; --npages) { pte_update(mm, addr, pte, 0, 0, 0); addr +=3D PAGE_SIZE; ++pte; } - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); pte_unmap_unlock(pte - 1, ptl); } =20 diff --git a/arch/sparc/include/asm/tlbflush_64.h b/arch/sparc/include/asm/= tlbflush_64.h index cd144eb31bdd..02c93a4e6af5 100644 --- a/arch/sparc/include/asm/tlbflush_64.h +++ b/arch/sparc/include/asm/tlbflush_64.h @@ -40,10 +40,11 @@ static inline void flush_tlb_range(struct vm_area_struc= t *vma, void flush_tlb_kernel_range(unsigned long start, unsigned long end); =20 #define __HAVE_ARCH_ENTER_LAZY_MMU_MODE +typedef int lazy_mmu_state_t; =20 void flush_tlb_pending(void); -void arch_enter_lazy_mmu_mode(void); -void arch_leave_lazy_mmu_mode(void); +lazy_mmu_state_t arch_enter_lazy_mmu_mode(void); +void arch_leave_lazy_mmu_mode(lazy_mmu_state_t state); =20 /* Local cpu only. */ void __flush_tlb_all(void); diff --git a/arch/sparc/mm/tlb.c b/arch/sparc/mm/tlb.c index a35ddcca5e76..bf5094b770af 100644 --- a/arch/sparc/mm/tlb.c +++ b/arch/sparc/mm/tlb.c @@ -50,16 +50,18 @@ void flush_tlb_pending(void) put_cpu_var(tlb_batch); } =20 -void arch_enter_lazy_mmu_mode(void) +lazy_mmu_state_t arch_enter_lazy_mmu_mode(void) { struct tlb_batch *tb; =20 preempt_disable(); tb =3D this_cpu_ptr(&tlb_batch); tb->active =3D 1; + + return LAZY_MMU_DEFAULT; } =20 -void arch_leave_lazy_mmu_mode(void) +void arch_leave_lazy_mmu_mode(lazy_mmu_state_t state) { struct tlb_batch *tb =3D this_cpu_ptr(&tlb_batch); =20 diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravir= t.h index b5e59a7ba0d0..65a0d394fba1 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -527,12 +527,14 @@ static inline void arch_end_context_switch(struct tas= k_struct *next) } =20 #define __HAVE_ARCH_ENTER_LAZY_MMU_MODE -static inline void arch_enter_lazy_mmu_mode(void) +static inline lazy_mmu_state_t arch_enter_lazy_mmu_mode(void) { PVOP_VCALL0(mmu.lazy_mode.enter); + + return LAZY_MMU_DEFAULT; } =20 -static inline void arch_leave_lazy_mmu_mode(void) +static inline void arch_leave_lazy_mmu_mode(lazy_mmu_state_t state) { PVOP_VCALL0(mmu.lazy_mode.leave); } diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/p= aravirt_types.h index 37a8627d8277..bc1af86868a3 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -41,6 +41,8 @@ struct pv_info { }; =20 #ifdef CONFIG_PARAVIRT_XXL +typedef int lazy_mmu_state_t; + struct pv_lazy_ops { /* Set deferred update mode, used for batching operations. */ void (*enter)(void); diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c index 26bbaf4b7330..a245ba47a631 100644 --- a/arch/x86/xen/enlighten_pv.c +++ b/arch/x86/xen/enlighten_pv.c @@ -426,7 +426,7 @@ static void xen_start_context_switch(struct task_struct= *prev) BUG_ON(preemptible()); =20 if (this_cpu_read(xen_lazy_mode) =3D=3D XEN_LAZY_MMU) { - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(LAZY_MMU_DEFAULT); set_ti_thread_flag(task_thread_info(prev), TIF_LAZY_MMU_UPDATES); } enter_lazy(XEN_LAZY_CPU); diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c index 2a4a8deaf612..2039d5132ca3 100644 --- a/arch/x86/xen/mmu_pv.c +++ b/arch/x86/xen/mmu_pv.c @@ -2140,7 +2140,7 @@ static void xen_flush_lazy_mmu(void) preempt_disable(); =20 if (xen_get_lazy_mode() =3D=3D XEN_LAZY_MMU) { - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(LAZY_MMU_DEFAULT); arch_enter_lazy_mmu_mode(); } =20 diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index ced01cf3c5ab..02aa55f83bae 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -2682,6 +2682,7 @@ static int pagemap_scan_thp_entry(pmd_t *pmd, unsigne= d long start, static int pagemap_scan_pmd_entry(pmd_t *pmd, unsigned long start, unsigned long end, struct mm_walk *walk) { + lazy_mmu_state_t lazy_mmu_state; struct pagemap_scan_private *p =3D walk->private; struct vm_area_struct *vma =3D walk->vma; unsigned long addr, flush_end =3D 0; @@ -2700,7 +2701,7 @@ static int pagemap_scan_pmd_entry(pmd_t *pmd, unsigne= d long start, return 0; } =20 - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); =20 if ((p->arg.flags & PM_SCAN_WP_MATCHING) && !p->vec_out) { /* Fast path for performing exclusive WP */ @@ -2770,7 +2771,7 @@ static int pagemap_scan_pmd_entry(pmd_t *pmd, unsigne= d long start, if (flush_end) flush_tlb_range(vma, start, addr); =20 - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); pte_unmap_unlock(start_pte, ptl); =20 cond_resched(); diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 275e8060d918..143d819c1386 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -1489,6 +1489,9 @@ extern void tlb_gather_mmu(struct mmu_gather *tlb, st= ruct mm_struct *mm); extern void tlb_gather_mmu_fullmm(struct mmu_gather *tlb, struct mm_struct= *mm); extern void tlb_finish_mmu(struct mmu_gather *tlb); =20 +#define LAZY_MMU_DEFAULT 0 +#define LAZY_MMU_NESTED 1 + struct vm_fault; =20 /** diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 8d6007123cdf..df0eb898b3fc 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -232,8 +232,10 @@ static inline int pmd_dirty(pmd_t pmd) * and the mode cannot be used in interrupt context. */ #ifndef __HAVE_ARCH_ENTER_LAZY_MMU_MODE -#define arch_enter_lazy_mmu_mode() do {} while (0) -#define arch_leave_lazy_mmu_mode() do {} while (0) +typedef int lazy_mmu_state_t; + +#define arch_enter_lazy_mmu_mode() (LAZY_MMU_DEFAULT) +#define arch_leave_lazy_mmu_mode(state) ((void)(state)) #endif =20 #ifndef pte_batch_hint diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c index 5d2a876035d6..60b1b72f5ce1 100644 --- a/mm/kasan/shadow.c +++ b/mm/kasan/shadow.c @@ -305,7 +305,7 @@ static int kasan_populate_vmalloc_pte(pte_t *ptep, unsi= gned long addr, pte_t pte; int index; =20 - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(LAZY_MMU_DEFAULT); =20 index =3D PFN_DOWN(addr - data->start); page =3D data->pages[index]; @@ -482,7 +482,7 @@ static int kasan_depopulate_vmalloc_pte(pte_t *ptep, un= signed long addr, pte_t pte; int none; =20 - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(LAZY_MMU_DEFAULT); =20 spin_lock(&init_mm.page_table_lock); pte =3D ptep_get(ptep); diff --git a/mm/madvise.c b/mm/madvise.c index 35ed4ab0d7c5..72c032f2cf56 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -357,6 +357,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, struct mm_walk *walk) { + lazy_mmu_state_t lazy_mmu_state; struct madvise_walk_private *private =3D walk->private; struct mmu_gather *tlb =3D private->tlb; bool pageout =3D private->pageout; @@ -455,7 +456,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, if (!start_pte) return 0; flush_tlb_batched_pending(mm); - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); for (; addr < end; pte +=3D nr, addr +=3D nr * PAGE_SIZE) { nr =3D 1; ptent =3D ptep_get(pte); @@ -463,7 +464,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, if (++batch_count =3D=3D SWAP_CLUSTER_MAX) { batch_count =3D 0; if (need_resched()) { - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); pte_unmap_unlock(start_pte, ptl); cond_resched(); goto restart; @@ -499,7 +500,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, if (!folio_trylock(folio)) continue; folio_get(folio); - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); pte_unmap_unlock(start_pte, ptl); start_pte =3D NULL; err =3D split_folio(folio); @@ -510,7 +511,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, if (!start_pte) break; flush_tlb_batched_pending(mm); - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); if (!err) nr =3D 0; continue; @@ -558,7 +559,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, } =20 if (start_pte) { - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); pte_unmap_unlock(start_pte, ptl); } if (pageout) @@ -657,6 +658,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned = long addr, =20 { const cydp_t cydp_flags =3D CYDP_CLEAR_YOUNG | CYDP_CLEAR_DIRTY; + lazy_mmu_state_t lazy_mmu_state; struct mmu_gather *tlb =3D walk->private; struct mm_struct *mm =3D tlb->mm; struct vm_area_struct *vma =3D walk->vma; @@ -677,7 +679,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned = long addr, if (!start_pte) return 0; flush_tlb_batched_pending(mm); - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); for (; addr !=3D end; pte +=3D nr, addr +=3D PAGE_SIZE * nr) { nr =3D 1; ptent =3D ptep_get(pte); @@ -727,7 +729,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned = long addr, if (!folio_trylock(folio)) continue; folio_get(folio); - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); pte_unmap_unlock(start_pte, ptl); start_pte =3D NULL; err =3D split_folio(folio); @@ -738,7 +740,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned = long addr, if (!start_pte) break; flush_tlb_batched_pending(mm); - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); if (!err) nr =3D 0; continue; @@ -778,7 +780,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned = long addr, if (nr_swap) add_mm_counter(mm, MM_SWAPENTS, nr_swap); if (start_pte) { - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); pte_unmap_unlock(start_pte, ptl); } cond_resched(); diff --git a/mm/memory.c b/mm/memory.c index d9de6c056179..a60aae069f1e 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1207,6 +1207,7 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct= vm_area_struct *src_vma, pmd_t *dst_pmd, pmd_t *src_pmd, unsigned long addr, unsigned long end) { + lazy_mmu_state_t lazy_mmu_state; struct mm_struct *dst_mm =3D dst_vma->vm_mm; struct mm_struct *src_mm =3D src_vma->vm_mm; pte_t *orig_src_pte, *orig_dst_pte; @@ -1254,7 +1255,7 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct= vm_area_struct *src_vma, spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); orig_src_pte =3D src_pte; orig_dst_pte =3D dst_pte; - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); =20 do { nr =3D 1; @@ -1323,7 +1324,7 @@ copy_pte_range(struct vm_area_struct *dst_vma, struct= vm_area_struct *src_vma, } while (dst_pte +=3D nr, src_pte +=3D nr, addr +=3D PAGE_SIZE * nr, addr !=3D end); =20 - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); pte_unmap_unlock(orig_src_pte, src_ptl); add_mm_rss_vec(dst_mm, rss); pte_unmap_unlock(orig_dst_pte, dst_ptl); @@ -1822,6 +1823,7 @@ static unsigned long zap_pte_range(struct mmu_gather = *tlb, unsigned long addr, unsigned long end, struct zap_details *details) { + lazy_mmu_state_t lazy_mmu_state; bool force_flush =3D false, force_break =3D false; struct mm_struct *mm =3D tlb->mm; int rss[NR_MM_COUNTERS]; @@ -1842,7 +1844,7 @@ static unsigned long zap_pte_range(struct mmu_gather = *tlb, return addr; =20 flush_tlb_batched_pending(mm); - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); do { bool any_skipped =3D false; =20 @@ -1874,7 +1876,7 @@ static unsigned long zap_pte_range(struct mmu_gather = *tlb, direct_reclaim =3D try_get_and_clear_pmd(mm, pmd, &pmdval); =20 add_mm_rss_vec(mm, rss); - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); =20 /* Do the actual TLB flush before dropping ptl */ if (force_flush) { @@ -2811,6 +2813,7 @@ static int remap_pte_range(struct mm_struct *mm, pmd_= t *pmd, unsigned long addr, unsigned long end, unsigned long pfn, pgprot_t prot) { + lazy_mmu_state_t lazy_mmu_state; pte_t *pte, *mapped_pte; spinlock_t *ptl; int err =3D 0; @@ -2818,7 +2821,7 @@ static int remap_pte_range(struct mm_struct *mm, pmd_= t *pmd, mapped_pte =3D pte =3D pte_alloc_map_lock(mm, pmd, addr, &ptl); if (!pte) return -ENOMEM; - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); do { BUG_ON(!pte_none(ptep_get(pte))); if (!pfn_modify_allowed(pfn, prot)) { @@ -2828,7 +2831,7 @@ static int remap_pte_range(struct mm_struct *mm, pmd_= t *pmd, set_pte_at(mm, addr, pte, pte_mkspecial(pfn_pte(pfn, prot))); pfn++; } while (pte++, addr +=3D PAGE_SIZE, addr !=3D end); - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); pte_unmap_unlock(mapped_pte, ptl); return err; } @@ -3117,6 +3120,7 @@ static int apply_to_pte_range(struct mm_struct *mm, p= md_t *pmd, pte_fn_t fn, void *data, bool create, pgtbl_mod_mask *mask) { + lazy_mmu_state_t lazy_mmu_state; pte_t *pte, *mapped_pte; int err =3D 0; spinlock_t *ptl; @@ -3135,7 +3139,7 @@ static int apply_to_pte_range(struct mm_struct *mm, p= md_t *pmd, return -EINVAL; } =20 - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); =20 if (fn) { do { @@ -3148,7 +3152,7 @@ static int apply_to_pte_range(struct mm_struct *mm, p= md_t *pmd, } *mask |=3D PGTBL_PTE_MODIFIED; =20 - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); =20 if (mm !=3D &init_mm) pte_unmap_unlock(mapped_pte, ptl); diff --git a/mm/migrate_device.c b/mm/migrate_device.c index abd9f6850db6..833ce5eafa40 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -59,6 +59,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, unsigned long end, struct mm_walk *walk) { + lazy_mmu_state_t lazy_mmu_state; struct migrate_vma *migrate =3D walk->private; struct folio *fault_folio =3D migrate->fault_page ? page_folio(migrate->fault_page) : NULL; @@ -110,7 +111,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, ptep =3D pte_offset_map_lock(mm, pmdp, addr, &ptl); if (!ptep) goto again; - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); =20 for (; addr < end; addr +=3D PAGE_SIZE, ptep++) { struct dev_pagemap *pgmap; @@ -287,7 +288,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, if (unmapped) flush_tlb_range(walk->vma, start, end); =20 - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); pte_unmap_unlock(ptep - 1, ptl); =20 return 0; diff --git a/mm/mprotect.c b/mm/mprotect.c index 113b48985834..7bba651e5aa3 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -273,6 +273,7 @@ static long change_pte_range(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, unsigned long end, pgprot_t newprot, unsigned long cp_flags) { + lazy_mmu_state_t lazy_mmu_state; pte_t *pte, oldpte; spinlock_t *ptl; long pages =3D 0; @@ -293,7 +294,7 @@ static long change_pte_range(struct mmu_gather *tlb, target_node =3D numa_node_id(); =20 flush_tlb_batched_pending(vma->vm_mm); - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); do { nr_ptes =3D 1; oldpte =3D ptep_get(pte); @@ -439,7 +440,7 @@ static long change_pte_range(struct mmu_gather *tlb, } } } while (pte +=3D nr_ptes, addr +=3D nr_ptes * PAGE_SIZE, addr !=3D end); - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); pte_unmap_unlock(pte - 1, ptl); =20 return pages; diff --git a/mm/mremap.c b/mm/mremap.c index 35de0a7b910e..a562d8cf1eee 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -193,6 +193,7 @@ static int mremap_folio_pte_batch(struct vm_area_struct= *vma, unsigned long addr static int move_ptes(struct pagetable_move_control *pmc, unsigned long extent, pmd_t *old_pmd, pmd_t *new_pmd) { + lazy_mmu_state_t lazy_mmu_state; struct vm_area_struct *vma =3D pmc->old; bool need_clear_uffd_wp =3D vma_has_uffd_without_event_remap(vma); struct mm_struct *mm =3D vma->vm_mm; @@ -256,7 +257,7 @@ static int move_ptes(struct pagetable_move_control *pmc, if (new_ptl !=3D old_ptl) spin_lock_nested(new_ptl, SINGLE_DEPTH_NESTING); flush_tlb_batched_pending(vma->vm_mm); - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); =20 for (; old_addr < old_end; old_ptep +=3D nr_ptes, old_addr +=3D nr_ptes *= PAGE_SIZE, new_ptep +=3D nr_ptes, new_addr +=3D nr_ptes * PAGE_SIZE) { @@ -301,7 +302,7 @@ static int move_ptes(struct pagetable_move_control *pmc, } } =20 - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); if (force_flush) flush_tlb_range(vma, old_end - len, old_end); if (new_ptl !=3D old_ptl) diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 50aaa8dcd24c..6ee71ba68b12 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -1076,6 +1076,7 @@ static long move_present_ptes(struct mm_struct *mm, struct folio **first_src_folio, unsigned long len, struct anon_vma *src_anon_vma) { + lazy_mmu_state_t lazy_mmu_state; int err =3D 0; struct folio *src_folio =3D *first_src_folio; unsigned long src_start =3D src_addr; @@ -1100,7 +1101,7 @@ static long move_present_ptes(struct mm_struct *mm, /* It's safe to drop the reference now as the page-table is holding one. = */ folio_put(*first_src_folio); *first_src_folio =3D NULL; - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); =20 while (true) { orig_src_pte =3D ptep_get_and_clear(mm, src_addr, src_pte); @@ -1138,7 +1139,7 @@ static long move_present_ptes(struct mm_struct *mm, break; } =20 - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); if (src_addr > src_start) flush_tlb_range(src_vma, src_start, src_addr); =20 diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 4249e1e01947..9fc86ddf1711 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -95,6 +95,7 @@ static int vmap_pte_range(pmd_t *pmd, unsigned long addr,= unsigned long end, phys_addr_t phys_addr, pgprot_t prot, unsigned int max_page_shift, pgtbl_mod_mask *mask) { + lazy_mmu_state_t lazy_mmu_state; pte_t *pte; u64 pfn; struct page *page; @@ -105,7 +106,7 @@ static int vmap_pte_range(pmd_t *pmd, unsigned long add= r, unsigned long end, if (!pte) return -ENOMEM; =20 - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); =20 do { if (unlikely(!pte_none(ptep_get(pte)))) { @@ -131,7 +132,7 @@ static int vmap_pte_range(pmd_t *pmd, unsigned long add= r, unsigned long end, pfn++; } while (pte +=3D PFN_DOWN(size), addr +=3D size, addr !=3D end); =20 - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); *mask |=3D PGTBL_PTE_MODIFIED; return 0; } @@ -354,12 +355,13 @@ int ioremap_page_range(unsigned long addr, unsigned l= ong end, static void vunmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long= end, pgtbl_mod_mask *mask) { + lazy_mmu_state_t lazy_mmu_state; pte_t *pte; pte_t ptent; unsigned long size =3D PAGE_SIZE; =20 pte =3D pte_offset_kernel(pmd, addr); - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); =20 do { #ifdef CONFIG_HUGETLB_PAGE @@ -378,7 +380,7 @@ static void vunmap_pte_range(pmd_t *pmd, unsigned long = addr, unsigned long end, WARN_ON(!pte_none(ptent) && !pte_present(ptent)); } while (pte +=3D (size >> PAGE_SHIFT), addr +=3D size, addr !=3D end); =20 - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); *mask |=3D PGTBL_PTE_MODIFIED; } =20 @@ -514,6 +516,7 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned lo= ng addr, unsigned long end, pgprot_t prot, struct page **pages, int *nr, pgtbl_mod_mask *mask) { + lazy_mmu_state_t lazy_mmu_state; int err =3D 0; pte_t *pte; =20 @@ -526,7 +529,7 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned lo= ng addr, if (!pte) return -ENOMEM; =20 - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); =20 do { struct page *page =3D pages[*nr]; @@ -548,7 +551,7 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned lo= ng addr, (*nr)++; } while (pte++, addr +=3D PAGE_SIZE, addr !=3D end); =20 - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); *mask |=3D PGTBL_PTE_MODIFIED; =20 return err; diff --git a/mm/vmscan.c b/mm/vmscan.c index ca9e1cd3cd68..2872497a0453 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3514,6 +3514,7 @@ static void walk_update_folio(struct lru_gen_mm_walk = *walk, struct folio *folio, static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long = end, struct mm_walk *args) { + lazy_mmu_state_t lazy_mmu_state; int i; bool dirty; pte_t *pte; @@ -3543,7 +3544,7 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long = start, unsigned long end, return false; } =20 - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); restart: for (i =3D pte_index(start), addr =3D start; addr !=3D end; i++, addr += =3D PAGE_SIZE) { unsigned long pfn; @@ -3584,7 +3585,7 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long = start, unsigned long end, if (i < PTRS_PER_PTE && get_next_vma(PMD_MASK, PAGE_SIZE, args, &start, &= end)) goto restart; =20 - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); pte_unmap_unlock(pte, ptl); =20 return suitable_to_scan(total, young); @@ -3593,6 +3594,7 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long = start, unsigned long end, static void walk_pmd_range_locked(pud_t *pud, unsigned long addr, struct v= m_area_struct *vma, struct mm_walk *args, unsigned long *bitmap, unsigned long *first) { + lazy_mmu_state_t lazy_mmu_state; int i; bool dirty; pmd_t *pmd; @@ -3625,7 +3627,7 @@ static void walk_pmd_range_locked(pud_t *pud, unsigne= d long addr, struct vm_area if (!spin_trylock(ptl)) goto done; =20 - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); =20 do { unsigned long pfn; @@ -3672,7 +3674,7 @@ static void walk_pmd_range_locked(pud_t *pud, unsigne= d long addr, struct vm_area =20 walk_update_folio(walk, last, gen, dirty); =20 - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); spin_unlock(ptl); done: *first =3D -1; @@ -4220,6 +4222,7 @@ static void lru_gen_age_node(struct pglist_data *pgda= t, struct scan_control *sc) */ bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { + lazy_mmu_state_t lazy_mmu_state; int i; bool dirty; unsigned long start; @@ -4271,7 +4274,7 @@ bool lru_gen_look_around(struct page_vma_mapped_walk = *pvmw) } } =20 - arch_enter_lazy_mmu_mode(); + lazy_mmu_state =3D arch_enter_lazy_mmu_mode(); =20 pte -=3D (addr - start) / PAGE_SIZE; =20 @@ -4305,7 +4308,7 @@ bool lru_gen_look_around(struct page_vma_mapped_walk = *pvmw) =20 walk_update_folio(walk, last, gen, dirty); =20 - arch_leave_lazy_mmu_mode(); + arch_leave_lazy_mmu_mode(lazy_mmu_state); =20 /* feedback from rmap walkers to page table walkers */ if (mm_state && suitable_to_scan(i, young)) --=20 2.47.0 From nobody Sun Sep 14 05:14:47 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 7C69B2ECD3A; Mon, 8 Sep 2025 07:40:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757317251; cv=none; b=b/OoF3L1jfIhBgq91jUzM03ccseWLM3FK3ThlR7xIHwHC+BP6N/SFUUb4hWturR6yuEvRY9oeSdaIqzfK2p8dH2InQvjh2VX0bUNtiiOdN+mtEELaXhJa1KDYx8wyqU6tKHHSdCEk4CmpZPnL8LOrutAlooMrnUdch8X1B/TIek= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757317251; c=relaxed/simple; bh=hkDsOOLYTwNQnHVD2JN+ks+GUnYIcxo3wNnKCrpM0zU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=LTFOqM9avr/ZNqUx8DDvjiJ+eZvATLkKAtVnl+YYKx++EA2iI6hNnnWs5seNwVwjyl5vF9TvCjQBjlWGnofD8KiplzBv4O+0ElzWSuncFo8a3NOCrhfRG/wC5NyjRSUTaMSHkf35EXNNGxWYr7RAdGuF+wWjFV97q7Y8FuTy9Qc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id AEDB91764; Mon, 8 Sep 2025 00:40:40 -0700 (PDT) Received: from e123572-lin.arm.com (e123572-lin.cambridge.arm.com [10.1.194.54]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 5A6C13F63F; Mon, 8 Sep 2025 00:40:44 -0700 (PDT) From: Kevin Brodsky To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Kevin Brodsky , Alexander Gordeev , Andreas Larsson , Andrew Morton , Boris Ostrovsky , Borislav Petkov , Catalin Marinas , Christophe Leroy , Dave Hansen , David Hildenbrand , "David S. Miller" , "H. Peter Anvin" , Ingo Molnar , Jann Horn , Juergen Gross , "Liam R. Howlett" , Lorenzo Stoakes , Madhavan Srinivasan , Michael Ellerman , Michal Hocko , Mike Rapoport , Nicholas Piggin , Peter Zijlstra , Ryan Roberts , Suren Baghdasaryan , Thomas Gleixner , Vlastimil Babka , Will Deacon , Yeoreum Yun , linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, sparclinux@vger.kernel.org, xen-devel@lists.xenproject.org Subject: [PATCH v2 3/7] arm64: mm: fully support nested lazy_mmu sections Date: Mon, 8 Sep 2025 08:39:27 +0100 Message-ID: <20250908073931.4159362-4-kevin.brodsky@arm.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250908073931.4159362-1-kevin.brodsky@arm.com> References: <20250908073931.4159362-1-kevin.brodsky@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Despite recent efforts to prevent lazy_mmu sections from nesting, it remains difficult to ensure that it never occurs - and in fact it does occur on arm64 in certain situations (CONFIG_DEBUG_PAGEALLOC). Commit 1ef3095b1405 ("arm64/mm: Permit lazy_mmu_mode to be nested") made nesting tolerable on arm64, but without truly supporting it: the inner leave() call clears TIF_LAZY_MMU, disabling the batching optimisation before the outer section ends. Now that the lazy_mmu API allows enter() to pass through a state to the matching leave() call, we can actually support nesting. If enter() is called inside an active lazy_mmu section, TIF_LAZY_MMU will already be set, and we can then return LAZY_MMU_NESTED to instruct the matching leave() call not to clear TIF_LAZY_MMU. The only effect of this patch is to ensure that TIF_LAZY_MMU (and therefore the batching optimisation) remains set until the outermost lazy_mmu section ends. leave() still emits barriers if needed, regardless of the nesting level, as the caller may expect any page table changes to become visible when leave() returns. Signed-off-by: Kevin Brodsky Reviewed-by: Yeoreum Yun --- arch/arm64/include/asm/pgtable.h | 19 +++++-------------- 1 file changed, 5 insertions(+), 14 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgta= ble.h index 816197d08165..602feda97dc4 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -85,24 +85,14 @@ typedef int lazy_mmu_state_t; =20 static inline lazy_mmu_state_t arch_enter_lazy_mmu_mode(void) { - /* - * lazy_mmu_mode is not supposed to permit nesting. But in practice this - * does happen with CONFIG_DEBUG_PAGEALLOC, where a page allocation - * inside a lazy_mmu_mode section (such as zap_pte_range()) will change - * permissions on the linear map with apply_to_page_range(), which - * re-enters lazy_mmu_mode. So we tolerate nesting in our - * implementation. The first call to arch_leave_lazy_mmu_mode() will - * flush and clear the flag such that the remainder of the work in the - * outer nest behaves as if outside of lazy mmu mode. This is safe and - * keeps tracking simple. - */ + int lazy_mmu_nested; =20 if (in_interrupt()) return LAZY_MMU_DEFAULT; =20 - set_thread_flag(TIF_LAZY_MMU); + lazy_mmu_nested =3D test_and_set_thread_flag(TIF_LAZY_MMU); =20 - return LAZY_MMU_DEFAULT; + return lazy_mmu_nested ? LAZY_MMU_NESTED : LAZY_MMU_DEFAULT; } =20 static inline void arch_leave_lazy_mmu_mode(lazy_mmu_state_t state) @@ -113,7 +103,8 @@ static inline void arch_leave_lazy_mmu_mode(lazy_mmu_st= ate_t state) if (test_and_clear_thread_flag(TIF_LAZY_MMU_PENDING)) emit_pte_barriers(); =20 - clear_thread_flag(TIF_LAZY_MMU); + if (state !=3D LAZY_MMU_NESTED) + clear_thread_flag(TIF_LAZY_MMU); } =20 #ifdef CONFIG_TRANSPARENT_HUGEPAGE --=20 2.47.0 From nobody Sun Sep 14 05:14:47 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=fail(p=none dis=none) header.from=arm.com Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1757317274519846.4041822017045; Mon, 8 Sep 2025 00:41:14 -0700 (PDT) Received: from list by lists.xenproject.org with outflank-mailman.1114644.1461503 (Exim 4.92) (envelope-from ) id 1uvWUk-0005s8-7Q; Mon, 08 Sep 2025 07:40:58 +0000 Received: by outflank-mailman (output) from mailman id 1114644.1461503; Mon, 08 Sep 2025 07:40:58 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1uvWUk-0005rt-2b; Mon, 08 Sep 2025 07:40:58 +0000 Received: by outflank-mailman (input) for mailman id 1114644; Mon, 08 Sep 2025 07:40:56 +0000 Received: from se1-gles-flk1-in.inumbo.com ([94.247.172.50] helo=se1-gles-flk1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1uvWUi-0004k6-Gn for xen-devel@lists.xenproject.org; Mon, 08 Sep 2025 07:40:56 +0000 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by se1-gles-flk1.inumbo.com (Halon) with ESMTP id 26a4c947-8c87-11f0-9809-7dc792cee155; Mon, 08 Sep 2025 09:40:54 +0200 (CEST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id ACC311BCA; Mon, 8 Sep 2025 00:40:45 -0700 (PDT) Received: from e123572-lin.arm.com (e123572-lin.cambridge.arm.com [10.1.194.54]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 589A33F63F; Mon, 8 Sep 2025 00:40:49 -0700 (PDT) X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 26a4c947-8c87-11f0-9809-7dc792cee155 From: Kevin Brodsky To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Kevin Brodsky , Alexander Gordeev , Andreas Larsson , Andrew Morton , Boris Ostrovsky , Borislav Petkov , Catalin Marinas , Christophe Leroy , Dave Hansen , David Hildenbrand , "David S. Miller" , "H. Peter Anvin" , Ingo Molnar , Jann Horn , Juergen Gross , "Liam R. Howlett" , Lorenzo Stoakes , Madhavan Srinivasan , Michael Ellerman , Michal Hocko , Mike Rapoport , Nicholas Piggin , Peter Zijlstra , Ryan Roberts , Suren Baghdasaryan , Thomas Gleixner , Vlastimil Babka , Will Deacon , Yeoreum Yun , linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, sparclinux@vger.kernel.org, xen-devel@lists.xenproject.org Subject: [PATCH v2 4/7] x86/xen: support nested lazy_mmu sections (again) Date: Mon, 8 Sep 2025 08:39:28 +0100 Message-ID: <20250908073931.4159362-5-kevin.brodsky@arm.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250908073931.4159362-1-kevin.brodsky@arm.com> References: <20250908073931.4159362-1-kevin.brodsky@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ZM-MESSAGEID: 1757317276963116600 Content-Type: text/plain; charset="utf-8" Commit 49147beb0ccb ("x86/xen: allow nesting of same lazy mode") originally introduced support for nested lazy sections (LAZY_MMU and LAZY_CPU). It later got reverted by commit c36549ff8d84 as its implementation turned out to be intolerant to preemption. Now that the lazy_mmu API allows enter() to pass through a state to the matching leave() call, we can support nesting again for the LAZY_MMU mode in a preemption-safe manner. If xen_enter_lazy_mmu() is called inside an active lazy_mmu section, xen_lazy_mode will already be set to XEN_LAZY_MMU and we can then return LAZY_MMU_NESTED to instruct the matching xen_leave_lazy_mmu() call to leave xen_lazy_mode unchanged. The only effect of this patch is to ensure that xen_lazy_mode remains set to XEN_LAZY_MMU until the outermost lazy_mmu section ends. xen_leave_lazy_mmu() still calls xen_mc_flush() unconditionally. Signed-off-by: Kevin Brodsky Reviewed-by: Juergen Gross --- arch/x86/include/asm/paravirt.h | 6 ++---- arch/x86/include/asm/paravirt_types.h | 4 ++-- arch/x86/xen/mmu_pv.c | 11 ++++++++--- 3 files changed, 12 insertions(+), 9 deletions(-) diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravir= t.h index 65a0d394fba1..4ecd3a6b1dea 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -529,14 +529,12 @@ static inline void arch_end_context_switch(struct tas= k_struct *next) #define __HAVE_ARCH_ENTER_LAZY_MMU_MODE static inline lazy_mmu_state_t arch_enter_lazy_mmu_mode(void) { - PVOP_VCALL0(mmu.lazy_mode.enter); - - return LAZY_MMU_DEFAULT; + return PVOP_CALL0(lazy_mmu_state_t, mmu.lazy_mode.enter); } =20 static inline void arch_leave_lazy_mmu_mode(lazy_mmu_state_t state) { - PVOP_VCALL0(mmu.lazy_mode.leave); + PVOP_VCALL1(mmu.lazy_mode.leave, state); } =20 static inline void arch_flush_lazy_mmu_mode(void) diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/p= aravirt_types.h index bc1af86868a3..b7c567ccbf32 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -45,8 +45,8 @@ typedef int lazy_mmu_state_t; =20 struct pv_lazy_ops { /* Set deferred update mode, used for batching operations. */ - void (*enter)(void); - void (*leave)(void); + lazy_mmu_state_t (*enter)(void); + void (*leave)(lazy_mmu_state_t); void (*flush)(void); } __no_randomize_layout; #endif diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c index 2039d5132ca3..6e5390ff06a5 100644 --- a/arch/x86/xen/mmu_pv.c +++ b/arch/x86/xen/mmu_pv.c @@ -2130,9 +2130,13 @@ static void xen_set_fixmap(unsigned idx, phys_addr_t= phys, pgprot_t prot) #endif } =20 -static void xen_enter_lazy_mmu(void) +static lazy_mmu_state_t xen_enter_lazy_mmu(void) { + if (this_cpu_read(xen_lazy_mode) =3D=3D XEN_LAZY_MMU) + return LAZY_MMU_NESTED; + enter_lazy(XEN_LAZY_MMU); + return LAZY_MMU_DEFAULT; } =20 static void xen_flush_lazy_mmu(void) @@ -2167,11 +2171,12 @@ static void __init xen_post_allocator_init(void) pv_ops.mmu.write_cr3 =3D &xen_write_cr3; } =20 -static void xen_leave_lazy_mmu(void) +static void xen_leave_lazy_mmu(lazy_mmu_state_t state) { preempt_disable(); xen_mc_flush(); - leave_lazy(XEN_LAZY_MMU); + if (state !=3D LAZY_MMU_NESTED) + leave_lazy(XEN_LAZY_MMU); preempt_enable(); } =20 --=20 2.47.0 From nobody Sun Sep 14 05:14:47 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=fail(p=none dis=none) header.from=arm.com Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1757317277151398.1715516228046; Mon, 8 Sep 2025 00:41:17 -0700 (PDT) Received: from list by lists.xenproject.org with outflank-mailman.1114649.1461513 (Exim 4.92) (envelope-from ) id 1uvWUo-0006DJ-Ei; Mon, 08 Sep 2025 07:41:02 +0000 Received: by outflank-mailman (output) from mailman id 1114649.1461513; Mon, 08 Sep 2025 07:41:02 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1uvWUo-0006D2-AH; Mon, 08 Sep 2025 07:41:02 +0000 Received: by outflank-mailman (input) for mailman id 1114649; Mon, 08 Sep 2025 07:41:01 +0000 Received: from se1-gles-flk1-in.inumbo.com ([94.247.172.50] helo=se1-gles-flk1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1uvWUn-0004k6-Tz for xen-devel@lists.xenproject.org; Mon, 08 Sep 2025 07:41:01 +0000 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by se1-gles-flk1.inumbo.com (Halon) with ESMTP id 2994595e-8c87-11f0-9809-7dc792cee155; Mon, 08 Sep 2025 09:40:59 +0200 (CEST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id AE1BA2A2A; Mon, 8 Sep 2025 00:40:50 -0700 (PDT) Received: from e123572-lin.arm.com (e123572-lin.cambridge.arm.com [10.1.194.54]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 538FB3F63F; Mon, 8 Sep 2025 00:40:54 -0700 (PDT) X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 2994595e-8c87-11f0-9809-7dc792cee155 From: Kevin Brodsky To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Kevin Brodsky , Alexander Gordeev , Andreas Larsson , Andrew Morton , Boris Ostrovsky , Borislav Petkov , Catalin Marinas , Christophe Leroy , Dave Hansen , David Hildenbrand , "David S. Miller" , "H. Peter Anvin" , Ingo Molnar , Jann Horn , Juergen Gross , "Liam R. Howlett" , Lorenzo Stoakes , Madhavan Srinivasan , Michael Ellerman , Michal Hocko , Mike Rapoport , Nicholas Piggin , Peter Zijlstra , Ryan Roberts , Suren Baghdasaryan , Thomas Gleixner , Vlastimil Babka , Will Deacon , Yeoreum Yun , linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, sparclinux@vger.kernel.org, xen-devel@lists.xenproject.org Subject: [PATCH v2 5/7] powerpc/mm: support nested lazy_mmu sections Date: Mon, 8 Sep 2025 08:39:29 +0100 Message-ID: <20250908073931.4159362-6-kevin.brodsky@arm.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250908073931.4159362-1-kevin.brodsky@arm.com> References: <20250908073931.4159362-1-kevin.brodsky@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ZM-MESSAGEID: 1757317279180116600 Content-Type: text/plain; charset="utf-8" The lazy_mmu API now allows nested sections to be handled by arch code: enter() can return a flag if called inside another lazy_mmu section, so that the matching call to leave() leaves any optimisation enabled. This patch implements that new logic for powerpc: if there is an active batch, then enter() returns LAZY_MMU_NESTED and the matching leave() leaves batch->active set. The preempt_{enable,disable} calls are left untouched as they already handle nesting themselves. TLB flushing is still done in leave() regardless of the nesting level, as the caller may rely on it whether nesting is occurring or not. Signed-off-by: Kevin Brodsky --- arch/powerpc/include/asm/book3s/64/tlbflush-hash.h | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h b/arch/powe= rpc/include/asm/book3s/64/tlbflush-hash.h index c9f1e819e567..e92bce2efca6 100644 --- a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h +++ b/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h @@ -39,9 +39,13 @@ static inline lazy_mmu_state_t arch_enter_lazy_mmu_mode(= void) */ preempt_disable(); batch =3D this_cpu_ptr(&ppc64_tlb_batch); - batch->active =3D 1; =20 - return LAZY_MMU_DEFAULT; + if (!batch->active) { + batch->active =3D 1; + return LAZY_MMU_DEFAULT; + } else { + return LAZY_MMU_NESTED; + } } =20 static inline void arch_leave_lazy_mmu_mode(lazy_mmu_state_t state) @@ -54,7 +58,10 @@ static inline void arch_leave_lazy_mmu_mode(lazy_mmu_sta= te_t state) =20 if (batch->index) __flush_tlb_pending(batch); - batch->active =3D 0; + + if (state !=3D LAZY_MMU_NESTED) + batch->active =3D 0; + preempt_enable(); } =20 --=20 2.47.0 From nobody Sun Sep 14 05:14:47 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id A120C2F0C45; Mon, 8 Sep 2025 07:41:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757317266; cv=none; b=Dn6H4N9NhPgbU3Q34vmjhwQjekKg2zrgCkTmux1K5uXP0eVgTYkLrkmReOAB/rMjXKvOnkBWk/UetJlNwcNWlBfCqHDYlrzoLPBtqMwlZFDOxtbLWO8iudBxzKhtMG95LN1gMp15w/oDCJHhUxDggj/NrflqFN9cbRToh4W/tPc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757317266; c=relaxed/simple; bh=RBBIxra09BhkwbooY81B9WzuSALW0LxVnMc2e055kqM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=TQhhUz6D9TNCEr9Ikg+d6GgSlX6yFvbTQrZCbvKrRyeQdJGLxYa29ecTWK0J7hs5eHyQx3AcsQjo7MzG+LP/MW0p6tY47ZMRapzt1drqT1m+6qIkQZnXmIVzRz2nogcAT4HpAOWnI8L72g6CetIpzJX32tuC9pe1xzHYpRJidCw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A803E2C46; Mon, 8 Sep 2025 00:40:55 -0700 (PDT) Received: from e123572-lin.arm.com (e123572-lin.cambridge.arm.com [10.1.194.54]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 553163F63F; Mon, 8 Sep 2025 00:40:59 -0700 (PDT) From: Kevin Brodsky To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Kevin Brodsky , Alexander Gordeev , Andreas Larsson , Andrew Morton , Boris Ostrovsky , Borislav Petkov , Catalin Marinas , Christophe Leroy , Dave Hansen , David Hildenbrand , "David S. Miller" , "H. Peter Anvin" , Ingo Molnar , Jann Horn , Juergen Gross , "Liam R. Howlett" , Lorenzo Stoakes , Madhavan Srinivasan , Michael Ellerman , Michal Hocko , Mike Rapoport , Nicholas Piggin , Peter Zijlstra , Ryan Roberts , Suren Baghdasaryan , Thomas Gleixner , Vlastimil Babka , Will Deacon , Yeoreum Yun , linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, sparclinux@vger.kernel.org, xen-devel@lists.xenproject.org Subject: [PATCH v2 6/7] sparc/mm: support nested lazy_mmu sections Date: Mon, 8 Sep 2025 08:39:30 +0100 Message-ID: <20250908073931.4159362-7-kevin.brodsky@arm.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250908073931.4159362-1-kevin.brodsky@arm.com> References: <20250908073931.4159362-1-kevin.brodsky@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The lazy_mmu API now allows nested sections to be handled by arch code: enter() can return a flag if called inside another lazy_mmu section, so that the matching call to leave() leaves any optimisation enabled. This patch implements that new logic for sparc: if there is an active batch, then enter() returns LAZY_MMU_NESTED and the matching leave() leaves batch->active set. The preempt_{enable,disable} calls are left untouched as they already handle nesting themselves. TLB flushing is still done in leave() regardless of the nesting level, as the caller may rely on it whether nesting is occurring or not. Signed-off-by: Kevin Brodsky --- arch/sparc/mm/tlb.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/arch/sparc/mm/tlb.c b/arch/sparc/mm/tlb.c index bf5094b770af..fdc33438b85f 100644 --- a/arch/sparc/mm/tlb.c +++ b/arch/sparc/mm/tlb.c @@ -56,9 +56,13 @@ lazy_mmu_state_t arch_enter_lazy_mmu_mode(void) =20 preempt_disable(); tb =3D this_cpu_ptr(&tlb_batch); - tb->active =3D 1; =20 - return LAZY_MMU_DEFAULT; + if (!tb->active) { + tb->active =3D 1; + return LAZY_MMU_DEFAULT; + } else { + return LAZY_MMU_NESTED; + } } =20 void arch_leave_lazy_mmu_mode(lazy_mmu_state_t state) @@ -67,7 +71,10 @@ void arch_leave_lazy_mmu_mode(lazy_mmu_state_t state) =20 if (tb->tlb_nr) flush_tlb_pending(); - tb->active =3D 0; + + if (state !=3D LAZY_MMU_NESTED) + tb->active =3D 0; + preempt_enable(); } =20 --=20 2.47.0 From nobody Sun Sep 14 05:14:47 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id B33A42F1FFB; Mon, 8 Sep 2025 07:41:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757317271; cv=none; b=CEOBh3JnNMDKw9j4+VRDHG4uH0TvUlTd2C0Pca0iTkhULqutkdb9SISiYF1rCF/43w4xLLUZHAVRE0E9seXULW19M1MecEfkn8CrE9ZYO2VDhztnvITu1YUYrly/aRTNTkbW6VVsiTP8/V0nc4zun7nnLfMyC4ppcxSKXe2hZIc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1757317271; c=relaxed/simple; bh=AoZR8SvPy8AOXRZjVBfCUHlDbwiaqAac3ZnmCJhxMOo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=oIEENuRAZGqi+hPYhPNtaiHcxxwc50zp9EPTyoYJMUlt58fHcljGCRWw6iI3wEAv06uUFp06telz6/y3k/U+uGP2xcizfCObkxeimx5qiiW7wV9nX3ihWQt5kgjd62ZgNnUZkPwS5n9CUqNLAiWRq6K8otE1r3YooNe3cBB6taI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A58261692; Mon, 8 Sep 2025 00:41:00 -0700 (PDT) Received: from e123572-lin.arm.com (e123572-lin.cambridge.arm.com [10.1.194.54]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 5128D3F63F; Mon, 8 Sep 2025 00:41:04 -0700 (PDT) From: Kevin Brodsky To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Kevin Brodsky , Alexander Gordeev , Andreas Larsson , Andrew Morton , Boris Ostrovsky , Borislav Petkov , Catalin Marinas , Christophe Leroy , Dave Hansen , David Hildenbrand , "David S. Miller" , "H. Peter Anvin" , Ingo Molnar , Jann Horn , Juergen Gross , "Liam R. Howlett" , Lorenzo Stoakes , Madhavan Srinivasan , Michael Ellerman , Michal Hocko , Mike Rapoport , Nicholas Piggin , Peter Zijlstra , Ryan Roberts , Suren Baghdasaryan , Thomas Gleixner , Vlastimil Babka , Will Deacon , Yeoreum Yun , linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, sparclinux@vger.kernel.org, xen-devel@lists.xenproject.org Subject: [PATCH v2 7/7] mm: update lazy_mmu documentation Date: Mon, 8 Sep 2025 08:39:31 +0100 Message-ID: <20250908073931.4159362-8-kevin.brodsky@arm.com> X-Mailer: git-send-email 2.47.0 In-Reply-To: <20250908073931.4159362-1-kevin.brodsky@arm.com> References: <20250908073931.4159362-1-kevin.brodsky@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" We now support nested lazy_mmu sections on all architectures implementing the API. Update the API comment accordingly. Acked-by: Mike Rapoport (Microsoft) Signed-off-by: Kevin Brodsky Reviewed-by: Yeoreum Yun --- include/linux/pgtable.h | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index df0eb898b3fc..85cd1fdb914f 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -228,8 +228,18 @@ static inline int pmd_dirty(pmd_t pmd) * of the lazy mode. So the implementation must assume preemption may be e= nabled * and cpu migration is possible; it must take steps to be robust against = this. * (In practice, for user PTE updates, the appropriate page table lock(s) = are - * held, but for kernel PTE updates, no lock is held). Nesting is not perm= itted - * and the mode cannot be used in interrupt context. + * held, but for kernel PTE updates, no lock is held). The mode cannot be = used + * in interrupt context. + * + * Calls may be nested: an arch_{enter,leave}_lazy_mmu_mode() pair may be = called + * while the lazy MMU mode has already been enabled. An implementation sho= uld + * handle this using the state returned by enter() and taken by the matchi= ng + * leave() call; the LAZY_MMU_{DEFAULT,NESTED} flags can be used to indica= te + * whether this enter/leave pair is nested inside another or not. (It is u= p to + * the implementation to track whether the lazy MMU mode is enabled at any= point + * in time.) The expectation is that leave() will flush any batched state + * unconditionally, but only leave the lazy MMU mode if the passed state i= s not + * LAZY_MMU_NESTED. */ #ifndef __HAVE_ARCH_ENTER_LAZY_MMU_MODE typedef int lazy_mmu_state_t; --=20 2.47.0