From nobody Fri Oct 31 11:33:01 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 106F222CBF4; Fri, 30 May 2025 14:05:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1748613912; cv=none; b=Ssb0FgrK05F10MV5Hi73oRhyV3qrsjamdGrTH4I6mkHm06xYdswcp2qYXr3G7V8cWeNc/TLER7qJUU5RIlu2t5x9gxmV1xJu7481xKbhrSCWfxC1FGyfXGsfBeFACwQtbCTPVke1SQ4PSAXwGOCA6Paw1nmabfS1H4hEfNsRzSU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1748613912; c=relaxed/simple; bh=ZBQiaYaMbLwJcZqn/mkzd8jTs4bL3ZRi0KrTn2K5OF8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=WivFanahqfZnGMllS18Smfz5hp4NSAnFBMKud4dbbL6v9xXDKKt/eZMlpRkyhVkEB7USfruPIJKaH3oXbC90UvrHrcXCXOJq3f8UK+vBrklOuLfeud2lj99Me4oVvn46QSbmCisOXR4uvuisRtURXVEG/z7jvaTycev5MJAaIIc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C1E0E16F8; Fri, 30 May 2025 07:04:53 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 3973B3F673; Fri, 30 May 2025 07:05:05 -0700 (PDT) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Madhavan Srinivasan , Michael Ellerman , Nicholas Piggin , Christophe Leroy , "David S. Miller" , Andreas Larsson , Juergen Gross , Ajay Kaher , Alexey Makhalov , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Boris Ostrovsky , "Aneesh Kumar K.V" , Andrew Morton , Peter Zijlstra , Arnd Bergmann , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Alexei Starovoitov , Andrey Ryabinin Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, sparclinux@vger.kernel.org, virtualization@lists.linux.dev, xen-devel@lists.xenproject.org, linux-mm@kvack.org Subject: [RFC PATCH v1 1/6] fs/proc/task_mmu: Fix pte update and tlb maintenance ordering in pagemap_scan_pmd_entry() Date: Fri, 30 May 2025 15:04:39 +0100 Message-ID: <20250530140446.2387131-2-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250530140446.2387131-1-ryan.roberts@arm.com> References: <20250530140446.2387131-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" pagemap_scan_pmd_entry() was previously modifying ptes while in lazy mmu mode, then performing tlb maintenance for the modified ptes, then leaving lazy mmu mode. But any pte modifications during lazy mmu mode may be deferred until arch_leave_lazy_mmu_mode(), inverting the required ordering between pte modificaiton and tlb maintenance. Let's fix that by leaving mmu mode, forcing all the pte updates to be actioned, before doing the tlb maintenance. This is a theorectical bug discovered during code review. Fixes: 52526ca7fdb9 ("fs/proc/task_mmu: implement IOCTL to get and optional= ly clear info about PTEs") Signed-off-by: Ryan Roberts --- fs/proc/task_mmu.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 994cde10e3f4..361f3ffd9a0c 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -2557,10 +2557,9 @@ static int pagemap_scan_pmd_entry(pmd_t *pmd, unsign= ed long start, } =20 flush_and_return: + arch_leave_lazy_mmu_mode(); if (flush_end) flush_tlb_range(vma, start, addr); - - arch_leave_lazy_mmu_mode(); pte_unmap_unlock(start_pte, ptl); =20 cond_resched(); --=20 2.43.0 From nobody Fri Oct 31 11:33:01 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 978B522ACD1; Fri, 30 May 2025 14:05:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1748613919; cv=none; b=UepbqLNgtLQbphWYs6wiZvChnOE8Bze5ubF0v5p0jCUJZHb8kKybSVObQQGsEOCevaLqGXfSlP9TvOvMxPvZGFWrgziEVfH4ryfGxGoAPslmou5A9bwFXUBY2pclZExlA7Cm0I3GDjRql/gUA7PhlIAbaz/c4AvB2Rl5jc4DxpU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1748613919; c=relaxed/simple; bh=I5HloKLMfWzlXiU6e/59qP+RTyrUsp4/b4lMuSl4vQs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=HI5kECNojqx3rogmDA73DFLDk6MyyLg+kMtcYFaH170G54GPzuvSwM7ogrya3nr5yaX1j89XUPwkyfL6EAHz4/+sN2UIkU1Z1fHIEm71MO1+a6BK5/Q2+Xz6W4SgLG1haosiIaqnt9lnmOuJ2sP+VEV8TbB93VyiavOv1nsa3Gk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 23F3F2247; Fri, 30 May 2025 07:04:59 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 8DD213F673; Fri, 30 May 2025 07:05:10 -0700 (PDT) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Madhavan Srinivasan , Michael Ellerman , Nicholas Piggin , Christophe Leroy , "David S. Miller" , Andreas Larsson , Juergen Gross , Ajay Kaher , Alexey Makhalov , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Boris Ostrovsky , "Aneesh Kumar K.V" , Andrew Morton , Peter Zijlstra , Arnd Bergmann , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Alexei Starovoitov , Andrey Ryabinin Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, sparclinux@vger.kernel.org, virtualization@lists.linux.dev, xen-devel@lists.xenproject.org, linux-mm@kvack.org Subject: [RFC PATCH v1 2/6] mm: Fix pte update and tlb maintenance ordering in migrate_vma_collect_pmd() Date: Fri, 30 May 2025 15:04:40 +0100 Message-ID: <20250530140446.2387131-3-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250530140446.2387131-1-ryan.roberts@arm.com> References: <20250530140446.2387131-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" migrate_vma_collect_pmd() was previously modifying ptes while in lazy mmu mode, then performing tlb maintenance for the modified ptes, then leaving lazy mmu mode. But any pte modifications during lazy mmu mode may be deferred until arch_leave_lazy_mmu_mode(), inverting the required ordering between pte modificaiton and tlb maintenance. Let's fix that by leaving mmu mode (forcing all the pte updates to be actioned) before doing the tlb maintenance. This is a theorectical bug discovered during code review. Fixes: 60bae7370896 ("mm/migrate_device.c: flush TLB while holding PTL") Signed-off-by: Ryan Roberts --- mm/migrate_device.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 3158afe7eb23..fc73a940c112 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -283,11 +283,12 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, migrate->src[migrate->npages++] =3D mpfn; } =20 + arch_leave_lazy_mmu_mode(); + /* Only flush the TLB if we actually modified any entries */ if (unmapped) flush_tlb_range(walk->vma, start, end); =20 - arch_leave_lazy_mmu_mode(); pte_unmap_unlock(ptep - 1, ptl); =20 return 0; --=20 2.43.0 From nobody Fri Oct 31 11:33:01 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id B001C22C35E; Fri, 30 May 2025 14:05:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1748613923; cv=none; b=IjETalERUjj+bKcsj5b5ZacWHiEb9MToN0Jwrj6wR9qOeU8cNzbyCS2C8DNfetyYv9EfSiJ3jh7llftLrMwoYb9CiOrCGSTil7joX7sG4v8NQ0+fZLukiYZDjCGtHMHDDfWRwXfeQUgdm+KFTjLpCLizNhGXecDuuLggrGzoyV0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1748613923; c=relaxed/simple; bh=3GcFcLOYmt/iurJiDf2/ca7cZA/iimYs1rjmr31U4Jw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=leiK+vbsE/RqWDckvH1HVISX8sSl9mIBoVMivjscAZWAueQCk7quZwYk0aHopbCxWfrDB4urrPpnTPyBoIMST65zgErXEMnBGaAsTaRIxiFlFvgKrWGQ6iBBsIsjO9ZV5n69AVz/1LaIote+9DdpLIlVAP3Ov3dIDJrakhEdpJU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 777752681; Fri, 30 May 2025 07:05:04 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E2FA03F673; Fri, 30 May 2025 07:05:15 -0700 (PDT) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Madhavan Srinivasan , Michael Ellerman , Nicholas Piggin , Christophe Leroy , "David S. Miller" , Andreas Larsson , Juergen Gross , Ajay Kaher , Alexey Makhalov , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Boris Ostrovsky , "Aneesh Kumar K.V" , Andrew Morton , Peter Zijlstra , Arnd Bergmann , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Alexei Starovoitov , Andrey Ryabinin Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, sparclinux@vger.kernel.org, virtualization@lists.linux.dev, xen-devel@lists.xenproject.org, linux-mm@kvack.org Subject: [RFC PATCH v1 3/6] mm: Avoid calling page allocator from apply_to_page_range() Date: Fri, 30 May 2025 15:04:41 +0100 Message-ID: <20250530140446.2387131-4-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250530140446.2387131-1-ryan.roberts@arm.com> References: <20250530140446.2387131-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Lazy mmu mode applies to the current task and permits pte modifications to be deferred and updated at a later time in a batch to improve performance. apply_to_page_range() calls its callback in lazy mmu mode and some of those callbacks call into the page allocator to either allocate or free pages. This is problematic with CONFIG_DEBUG_PAGEALLOC because debug_pagealloc_[un]map_pages() calls the arch implementation of __kernel_map_pages() which must modify the ptes for the linear map. There are two possibilities at this point: - If the arch implementation modifies the ptes directly without first entering lazy mmu mode, the pte modifications may get deferred until the existing lazy mmu mode is exited. This could result in taking spurious faults for example. - If the arch implementation enters a nested lazy mmu mode before modification of the ptes (many arches use apply_to_page_range()), then the linear map updates will definitely be applied upon leaving the inner lazy mmu mode. But because lazy mmu mode does not support nesting, the remainder of the outer user is no longer in lazy mmu mode and the optimization opportunity is lost. So let's just ensure that the page allocator is never called from within lazy mmu mode. New "_nolazy" variants of apply_to_page_range() and apply_to_existing_page_range() are introduced which don't enter lazy mmu mode. Then users which need to call into the page allocator within their callback are updated to use the _nolazy variants. Signed-off-by: Ryan Roberts --- include/linux/mm.h | 6 ++++++ kernel/bpf/arena.c | 6 +++--- mm/kasan/shadow.c | 2 +- mm/memory.c | 54 +++++++++++++++++++++++++++++++++++----------- 4 files changed, 51 insertions(+), 17 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index e51dba8398f7..11cae6ce04ff 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -3743,9 +3743,15 @@ static inline bool gup_can_follow_protnone(struct vm= _area_struct *vma, typedef int (*pte_fn_t)(pte_t *pte, unsigned long addr, void *data); extern int apply_to_page_range(struct mm_struct *mm, unsigned long address, unsigned long size, pte_fn_t fn, void *data); +extern int apply_to_page_range_nolazy(struct mm_struct *mm, + unsigned long address, unsigned long size, + pte_fn_t fn, void *data); extern int apply_to_existing_page_range(struct mm_struct *mm, unsigned long address, unsigned long size, pte_fn_t fn, void *data); +extern int apply_to_existing_page_range_nolazy(struct mm_struct *mm, + unsigned long address, unsigned long size, + pte_fn_t fn, void *data); =20 #ifdef CONFIG_PAGE_POISONING extern void __kernel_poison_pages(struct page *page, int numpages); diff --git a/kernel/bpf/arena.c b/kernel/bpf/arena.c index 0d56cea71602..ca833cfeefb7 100644 --- a/kernel/bpf/arena.c +++ b/kernel/bpf/arena.c @@ -187,10 +187,10 @@ static void arena_map_free(struct bpf_map *map) /* * free_vm_area() calls remove_vm_area() that calls free_unmap_vmap_area(= ). * It unmaps everything from vmalloc area and clears pgtables. - * Call apply_to_existing_page_range() first to find populated ptes and - * free those pages. + * Call apply_to_existing_page_range_nolazy() first to find populated + * ptes and free those pages. */ - apply_to_existing_page_range(&init_mm, bpf_arena_get_kern_vm_start(arena), + apply_to_existing_page_range_nolazy(&init_mm, bpf_arena_get_kern_vm_start= (arena), KERN_VM_SZ - GUARD_SZ, existing_page_cb, NULL); free_vm_area(arena->kern_vm); range_tree_destroy(&arena->rt); diff --git a/mm/kasan/shadow.c b/mm/kasan/shadow.c index d2c70cd2afb1..2325c5166c3a 100644 --- a/mm/kasan/shadow.c +++ b/mm/kasan/shadow.c @@ -590,7 +590,7 @@ void kasan_release_vmalloc(unsigned long start, unsigne= d long end, =20 =20 if (flags & KASAN_VMALLOC_PAGE_RANGE) - apply_to_existing_page_range(&init_mm, + apply_to_existing_page_range_nolazy(&init_mm, (unsigned long)shadow_start, size, kasan_depopulate_vmalloc_pte, NULL); diff --git a/mm/memory.c b/mm/memory.c index 49199410805c..24436074ce48 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2913,7 +2913,7 @@ EXPORT_SYMBOL(vm_iomap_memory); static int apply_to_pte_range(struct mm_struct *mm, pmd_t *pmd, unsigned long addr, unsigned long end, pte_fn_t fn, void *data, bool create, - pgtbl_mod_mask *mask) + pgtbl_mod_mask *mask, bool lazy_mmu) { pte_t *pte, *mapped_pte; int err =3D 0; @@ -2933,7 +2933,8 @@ static int apply_to_pte_range(struct mm_struct *mm, p= md_t *pmd, return -EINVAL; } =20 - arch_enter_lazy_mmu_mode(); + if (lazy_mmu) + arch_enter_lazy_mmu_mode(); =20 if (fn) { do { @@ -2946,7 +2947,8 @@ static int apply_to_pte_range(struct mm_struct *mm, p= md_t *pmd, } *mask |=3D PGTBL_PTE_MODIFIED; =20 - arch_leave_lazy_mmu_mode(); + if (lazy_mmu) + arch_leave_lazy_mmu_mode(); =20 if (mm !=3D &init_mm) pte_unmap_unlock(mapped_pte, ptl); @@ -2956,7 +2958,7 @@ static int apply_to_pte_range(struct mm_struct *mm, p= md_t *pmd, static int apply_to_pmd_range(struct mm_struct *mm, pud_t *pud, unsigned long addr, unsigned long end, pte_fn_t fn, void *data, bool create, - pgtbl_mod_mask *mask) + pgtbl_mod_mask *mask, bool lazy_mmu) { pmd_t *pmd; unsigned long next; @@ -2983,7 +2985,7 @@ static int apply_to_pmd_range(struct mm_struct *mm, p= ud_t *pud, pmd_clear_bad(pmd); } err =3D apply_to_pte_range(mm, pmd, addr, next, - fn, data, create, mask); + fn, data, create, mask, lazy_mmu); if (err) break; } while (pmd++, addr =3D next, addr !=3D end); @@ -2994,7 +2996,7 @@ static int apply_to_pmd_range(struct mm_struct *mm, p= ud_t *pud, static int apply_to_pud_range(struct mm_struct *mm, p4d_t *p4d, unsigned long addr, unsigned long end, pte_fn_t fn, void *data, bool create, - pgtbl_mod_mask *mask) + pgtbl_mod_mask *mask, bool lazy_mmu) { pud_t *pud; unsigned long next; @@ -3019,7 +3021,7 @@ static int apply_to_pud_range(struct mm_struct *mm, p= 4d_t *p4d, pud_clear_bad(pud); } err =3D apply_to_pmd_range(mm, pud, addr, next, - fn, data, create, mask); + fn, data, create, mask, lazy_mmu); if (err) break; } while (pud++, addr =3D next, addr !=3D end); @@ -3030,7 +3032,7 @@ static int apply_to_pud_range(struct mm_struct *mm, p= 4d_t *p4d, static int apply_to_p4d_range(struct mm_struct *mm, pgd_t *pgd, unsigned long addr, unsigned long end, pte_fn_t fn, void *data, bool create, - pgtbl_mod_mask *mask) + pgtbl_mod_mask *mask, bool lazy_mmu) { p4d_t *p4d; unsigned long next; @@ -3055,7 +3057,7 @@ static int apply_to_p4d_range(struct mm_struct *mm, p= gd_t *pgd, p4d_clear_bad(p4d); } err =3D apply_to_pud_range(mm, p4d, addr, next, - fn, data, create, mask); + fn, data, create, mask, lazy_mmu); if (err) break; } while (p4d++, addr =3D next, addr !=3D end); @@ -3065,7 +3067,7 @@ static int apply_to_p4d_range(struct mm_struct *mm, p= gd_t *pgd, =20 static int __apply_to_page_range(struct mm_struct *mm, unsigned long addr, unsigned long size, pte_fn_t fn, - void *data, bool create) + void *data, bool create, bool lazy_mmu) { pgd_t *pgd; unsigned long start =3D addr, next; @@ -3091,7 +3093,7 @@ static int __apply_to_page_range(struct mm_struct *mm= , unsigned long addr, pgd_clear_bad(pgd); } err =3D apply_to_p4d_range(mm, pgd, addr, next, - fn, data, create, &mask); + fn, data, create, &mask, lazy_mmu); if (err) break; } while (pgd++, addr =3D next, addr !=3D end); @@ -3105,11 +3107,14 @@ static int __apply_to_page_range(struct mm_struct *= mm, unsigned long addr, /* * Scan a region of virtual memory, filling in page tables as necessary * and calling a provided function on each leaf page table. + * + * fn() is called in lazy mmu mode. As a result, the callback must be care= ful + * not to perform memory allocation. */ int apply_to_page_range(struct mm_struct *mm, unsigned long addr, unsigned long size, pte_fn_t fn, void *data) { - return __apply_to_page_range(mm, addr, size, fn, data, true); + return __apply_to_page_range(mm, addr, size, fn, data, true, true); } EXPORT_SYMBOL_GPL(apply_to_page_range); =20 @@ -3117,13 +3122,36 @@ EXPORT_SYMBOL_GPL(apply_to_page_range); * Scan a region of virtual memory, calling a provided function on * each leaf page table where it exists. * + * fn() is called in lazy mmu mode. As a result, the callback must be care= ful + * not to perform memory allocation. + * * Unlike apply_to_page_range, this does _not_ fill in page tables * where they are absent. */ int apply_to_existing_page_range(struct mm_struct *mm, unsigned long addr, unsigned long size, pte_fn_t fn, void *data) { - return __apply_to_page_range(mm, addr, size, fn, data, false); + return __apply_to_page_range(mm, addr, size, fn, data, false, true); +} + +/* + * As per apply_to_page_range() but fn() is not called in lazy mmu mode. + */ +int apply_to_page_range_nolazy(struct mm_struct *mm, unsigned long addr, + unsigned long size, pte_fn_t fn, void *data) +{ + return __apply_to_page_range(mm, addr, size, fn, data, true, false); +} + +/* + * As per apply_to_existing_page_range() but fn() is not called in lazy mmu + * mode. + */ +int apply_to_existing_page_range_nolazy(struct mm_struct *mm, + unsigned long addr, unsigned long size, + pte_fn_t fn, void *data) +{ + return __apply_to_page_range(mm, addr, size, fn, data, false, false); } =20 /* --=20 2.43.0 From nobody Fri Oct 31 11:33:01 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 08EBA22D4F6; Fri, 30 May 2025 14:05:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1748613928; cv=none; b=pqgxMuMv3Q5XHo+bMj74euwquRrMzzzL8B4t6HRO+kPZ/VHUpcb+Pc5GdJtg167Dc7KLf9298wJ/eZABS+lwh6ED44KBLNns1lqndQKQeGCM+wccmkTubM07wQGMnAKGKpx0KoFA2x8Nt8rGH8zE/2gFQDPDI/Q+1Hv0IqZ2F1U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1748613928; c=relaxed/simple; bh=JOkViFJCYdmCiaS5r3IY+BVAAoZbOOWfrxepKaszx1k=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=g0mzafD8XzTD/2MzUnRdWLsFM1B6g8w8fMVU9nUigrnq7azrRVg0cNYbWBbAoWgt+R6Wc5x3ffirW+eZX1JqJTwcF4+2zDF8WIqI6gA4XmqevyUEPThU16x+YBCreTKeEoJ83BrPb3zAjMfCM468gPY45BPJfQBQi24y0Z8+vKU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id CC0EB2682; Fri, 30 May 2025 07:05:09 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 42F963F673; Fri, 30 May 2025 07:05:21 -0700 (PDT) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Madhavan Srinivasan , Michael Ellerman , Nicholas Piggin , Christophe Leroy , "David S. Miller" , Andreas Larsson , Juergen Gross , Ajay Kaher , Alexey Makhalov , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Boris Ostrovsky , "Aneesh Kumar K.V" , Andrew Morton , Peter Zijlstra , Arnd Bergmann , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Alexei Starovoitov , Andrey Ryabinin Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, sparclinux@vger.kernel.org, virtualization@lists.linux.dev, xen-devel@lists.xenproject.org, linux-mm@kvack.org Subject: [RFC PATCH v1 4/6] mm: Introduce arch_in_lazy_mmu_mode() Date: Fri, 30 May 2025 15:04:42 +0100 Message-ID: <20250530140446.2387131-5-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250530140446.2387131-1-ryan.roberts@arm.com> References: <20250530140446.2387131-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Introduce new arch_in_lazy_mmu_mode() API, which returns true if the calling context is currently in lazy mmu mode or false otherwise. Each arch that supports lazy mmu mode must provide an implementation of this API. The API will shortly be used to prevent accidental lazy mmu mode nesting when performing an allocation, and will additionally be used to ensure pte modification vs tlb flushing order does not get inadvertantly swapped. Signed-off-by: Ryan Roberts --- arch/arm64/include/asm/pgtable.h | 8 ++++++++ .../powerpc/include/asm/book3s/64/tlbflush-hash.h | 15 +++++++++++++++ arch/sparc/include/asm/tlbflush_64.h | 1 + arch/sparc/mm/tlb.c | 12 ++++++++++++ arch/x86/include/asm/paravirt.h | 5 +++++ arch/x86/include/asm/paravirt_types.h | 1 + arch/x86/kernel/paravirt.c | 6 ++++++ arch/x86/xen/mmu_pv.c | 6 ++++++ include/linux/pgtable.h | 1 + 9 files changed, 55 insertions(+) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgta= ble.h index 5285757ee0c1..add75dee49f5 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -119,6 +119,14 @@ static inline void arch_leave_lazy_mmu_mode(void) clear_thread_flag(TIF_LAZY_MMU); } =20 +static inline bool arch_in_lazy_mmu_mode(void) +{ + if (in_interrupt()) + return false; + + return test_thread_flag(TIF_LAZY_MMU); +} + #ifdef CONFIG_TRANSPARENT_HUGEPAGE #define __HAVE_ARCH_FLUSH_PMD_TLB_RANGE =20 diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h b/arch/powe= rpc/include/asm/book3s/64/tlbflush-hash.h index 146287d9580f..4123a9da32cc 100644 --- a/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h +++ b/arch/powerpc/include/asm/book3s/64/tlbflush-hash.h @@ -57,6 +57,21 @@ static inline void arch_leave_lazy_mmu_mode(void) =20 #define arch_flush_lazy_mmu_mode() do {} while (0) =20 +static inline bool arch_in_lazy_mmu_mode(void) +{ + struct ppc64_tlb_batch *batch; + bool active; + + if (radix_enabled()) + return false; + + batch =3D get_cpu_ptr(&ppc64_tlb_batch); + active =3D batch->active; + put_cpu_ptr(&ppc64_tlb_batch); + + return active; +} + extern void hash__tlbiel_all(unsigned int action); =20 extern void flush_hash_page(unsigned long vpn, real_pte_t pte, int psize, diff --git a/arch/sparc/include/asm/tlbflush_64.h b/arch/sparc/include/asm/= tlbflush_64.h index 8b8cdaa69272..204bc957df9e 100644 --- a/arch/sparc/include/asm/tlbflush_64.h +++ b/arch/sparc/include/asm/tlbflush_64.h @@ -45,6 +45,7 @@ void flush_tlb_pending(void); void arch_enter_lazy_mmu_mode(void); void arch_leave_lazy_mmu_mode(void); #define arch_flush_lazy_mmu_mode() do {} while (0) +bool arch_in_lazy_mmu_mode(void); =20 /* Local cpu only. */ void __flush_tlb_all(void); diff --git a/arch/sparc/mm/tlb.c b/arch/sparc/mm/tlb.c index a35ddcca5e76..83ab4ba4f4fb 100644 --- a/arch/sparc/mm/tlb.c +++ b/arch/sparc/mm/tlb.c @@ -69,6 +69,18 @@ void arch_leave_lazy_mmu_mode(void) preempt_enable(); } =20 +bool arch_in_lazy_mmu_mode(void) +{ + struct tlb_batch *tb; + bool active; + + tb =3D get_cpu_ptr(&tlb_batch); + active =3D tb->active; + put_cpu_ptr(&tlb_batch); + + return active; +} + static void tlb_batch_add_one(struct mm_struct *mm, unsigned long vaddr, bool exec, unsigned int hugepage_shift) { diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravir= t.h index b5e59a7ba0d0..c7ea3ccb8a41 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -542,6 +542,11 @@ static inline void arch_flush_lazy_mmu_mode(void) PVOP_VCALL0(mmu.lazy_mode.flush); } =20 +static inline bool arch_in_lazy_mmu_mode(void) +{ + return PVOP_CALL0(bool, mmu.lazy_mode.in); +} + static inline void __set_fixmap(unsigned /* enum fixed_addresses */ idx, phys_addr_t phys, pgprot_t flags) { diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/p= aravirt_types.h index 37a8627d8277..41001ca9d010 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -46,6 +46,7 @@ struct pv_lazy_ops { void (*enter)(void); void (*leave)(void); void (*flush)(void); + bool (*in)(void); } __no_randomize_layout; #endif =20 diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c index ab3e172dcc69..9af1a04a47fd 100644 --- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -106,6 +106,11 @@ static noinstr void pv_native_set_debugreg(int regno, = unsigned long val) { native_set_debugreg(regno, val); } + +static noinstr bool paravirt_retfalse(void) +{ + return false; +} #endif =20 struct pv_info pv_info =3D { @@ -228,6 +233,7 @@ struct paravirt_patch_template pv_ops =3D { .enter =3D paravirt_nop, .leave =3D paravirt_nop, .flush =3D paravirt_nop, + .in =3D paravirt_retfalse, }, =20 .mmu.set_fixmap =3D native_set_fixmap, diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c index 2a4a8deaf612..74f7a8537911 100644 --- a/arch/x86/xen/mmu_pv.c +++ b/arch/x86/xen/mmu_pv.c @@ -2147,6 +2147,11 @@ static void xen_flush_lazy_mmu(void) preempt_enable(); } =20 +static bool xen_in_lazy_mmu(void) +{ + return xen_get_lazy_mode() =3D=3D XEN_LAZY_MMU; +} + static void __init xen_post_allocator_init(void) { pv_ops.mmu.set_pte =3D xen_set_pte; @@ -2230,6 +2235,7 @@ static const typeof(pv_ops) xen_mmu_ops __initconst = =3D { .enter =3D xen_enter_lazy_mmu, .leave =3D xen_leave_lazy_mmu, .flush =3D xen_flush_lazy_mmu, + .in =3D xen_in_lazy_mmu, }, =20 .set_fixmap =3D xen_set_fixmap, diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index b50447ef1c92..580d9971f435 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -235,6 +235,7 @@ static inline int pmd_dirty(pmd_t pmd) #define arch_enter_lazy_mmu_mode() do {} while (0) #define arch_leave_lazy_mmu_mode() do {} while (0) #define arch_flush_lazy_mmu_mode() do {} while (0) +#define arch_in_lazy_mmu_mode() false #endif =20 #ifndef pte_batch_hint --=20 2.43.0 From nobody Fri Oct 31 11:33:01 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) client-ip=192.237.175.120; envelope-from=xen-devel-bounces@lists.xenproject.org; helo=lists.xenproject.org; Authentication-Results: mx.zohomail.com; spf=pass (zohomail.com: domain of lists.xenproject.org designates 192.237.175.120 as permitted sender) smtp.mailfrom=xen-devel-bounces@lists.xenproject.org; dmarc=fail(p=none dis=none) header.from=arm.com Return-Path: Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) by mx.zohomail.com with SMTPS id 1748613948665199.92123882250246; Fri, 30 May 2025 07:05:48 -0700 (PDT) Received: from list by lists.xenproject.org with outflank-mailman.1001238.1381476 (Exim 4.92) (envelope-from ) id 1uL0MZ-0007rj-29; Fri, 30 May 2025 14:05:35 +0000 Received: by outflank-mailman (output) from mailman id 1001238.1381476; Fri, 30 May 2025 14:05:35 +0000 Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1uL0MY-0007rV-Vo; Fri, 30 May 2025 14:05:34 +0000 Received: by outflank-mailman (input) for mailman id 1001238; Fri, 30 May 2025 14:05:33 +0000 Received: from se1-gles-sth1-in.inumbo.com ([159.253.27.254] helo=se1-gles-sth1.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.92) (envelope-from ) id 1uL0MX-00064k-On for xen-devel@lists.xenproject.org; Fri, 30 May 2025 14:05:33 +0000 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by se1-gles-sth1.inumbo.com (Halon) with ESMTP id 262c5a86-3d5f-11f0-a2ff-13f23c93f187; Fri, 30 May 2025 16:05:32 +0200 (CEST) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 2A52826A4; Fri, 30 May 2025 07:05:15 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 9729C3F673; Fri, 30 May 2025 07:05:26 -0700 (PDT) X-Outflank-Mailman: Message body and most headers restored to incoming version X-BeenThere: xen-devel@lists.xenproject.org List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xenproject.org Precedence: list Sender: "Xen-devel" X-Inumbo-ID: 262c5a86-3d5f-11f0-a2ff-13f23c93f187 From: Ryan Roberts To: Catalin Marinas , Will Deacon , Madhavan Srinivasan , Michael Ellerman , Nicholas Piggin , Christophe Leroy , "David S. Miller" , Andreas Larsson , Juergen Gross , Ajay Kaher , Alexey Makhalov , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Boris Ostrovsky , "Aneesh Kumar K.V" , Andrew Morton , Peter Zijlstra , Arnd Bergmann , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Alexei Starovoitov , Andrey Ryabinin Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, sparclinux@vger.kernel.org, virtualization@lists.linux.dev, xen-devel@lists.xenproject.org, linux-mm@kvack.org Subject: [RFC PATCH v1 5/6] mm: Avoid calling page allocator while in lazy mmu mode Date: Fri, 30 May 2025 15:04:43 +0100 Message-ID: <20250530140446.2387131-6-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250530140446.2387131-1-ryan.roberts@arm.com> References: <20250530140446.2387131-1-ryan.roberts@arm.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-ZM-MESSAGEID: 1748613950984116600 Content-Type: text/plain; charset="utf-8" Lazy mmu mode applies to the current task and permits pte modifications to be deferred and updated at a later time in a batch to improve performance. tlb_next_batch() is called in lazy mmu mode as follows: zap_pte_range arch_enter_lazy_mmu_mode do_zap_pte_range zap_present_ptes zap_present_folio_ptes __tlb_remove_folio_pages __tlb_remove_folio_pages_size tlb_next_batch arch_leave_lazy_mmu_mode tlb_next_batch() may call into the page allocator which is problematic with CONFIG_DEBUG_PAGEALLOC because debug_pagealloc_[un]map_pages() calls the arch implementation of __kernel_map_pages() which must modify the ptes for the linear map. There are two possibilities at this point: - If the arch implementation modifies the ptes directly without first entering lazy mmu mode, the pte modifications may get deferred until the existing lazy mmu mode is exited. This could result in taking spurious faults for example. - If the arch implementation enters a nested lazy mmu mode before modification of the ptes (many arches use apply_to_page_range()), then the linear map updates will definitely be applied upon leaving the inner lazy mmu mode. But because lazy mmu mode does not support nesting, the remainder of the outer user is no longer in lazy mmu mode and the optimization opportunity is lost. So let's just ensure that the page allocator is never called from within lazy mmu mode. Use the new arch_in_lazy_mmu_mode() API to check if we are in lazy mmu mode, and if so, when calling into the page allocator, temporarily leave lazy mmu mode. Given this new API we can also add VM_WARNings to check that we exit lazy mmu mode when required to ensure the PTEs are actually updated prior to tlb flushing. Signed-off-by: Ryan Roberts --- include/asm-generic/tlb.h | 2 ++ mm/mmu_gather.c | 15 +++++++++++++++ 2 files changed, 17 insertions(+) diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h index 88a42973fa47..84fb269b78a5 100644 --- a/include/asm-generic/tlb.h +++ b/include/asm-generic/tlb.h @@ -469,6 +469,8 @@ tlb_update_vma_flags(struct mmu_gather *tlb, struct vm_= area_struct *vma) =20 static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb) { + VM_WARN_ON(arch_in_lazy_mmu_mode()); + /* * Anything calling __tlb_adjust_range() also sets at least one of * these bits. diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c index db7ba4a725d6..0bd1e69b048b 100644 --- a/mm/mmu_gather.c +++ b/mm/mmu_gather.c @@ -18,6 +18,7 @@ static bool tlb_next_batch(struct mmu_gather *tlb) { struct mmu_gather_batch *batch; + bool lazy_mmu; =20 /* Limit batching if we have delayed rmaps pending */ if (tlb->delayed_rmap && tlb->active !=3D &tlb->local) @@ -32,7 +33,15 @@ static bool tlb_next_batch(struct mmu_gather *tlb) if (tlb->batch_count =3D=3D MAX_GATHER_BATCH_COUNT) return false; =20 + lazy_mmu =3D arch_in_lazy_mmu_mode(); + if (lazy_mmu) + arch_leave_lazy_mmu_mode(); + batch =3D (void *)__get_free_page(GFP_NOWAIT | __GFP_NOWARN); + + if (lazy_mmu) + arch_enter_lazy_mmu_mode(); + if (!batch) return false; =20 @@ -145,6 +154,8 @@ static void tlb_batch_pages_flush(struct mmu_gather *tl= b) { struct mmu_gather_batch *batch; =20 + VM_WARN_ON(arch_in_lazy_mmu_mode()); + for (batch =3D &tlb->local; batch && batch->nr; batch =3D batch->next) __tlb_batch_free_encoded_pages(batch); tlb->active =3D &tlb->local; @@ -154,6 +165,8 @@ static void tlb_batch_list_free(struct mmu_gather *tlb) { struct mmu_gather_batch *batch, *next; =20 + VM_WARN_ON(arch_in_lazy_mmu_mode()); + for (batch =3D tlb->local.next; batch; batch =3D next) { next =3D batch->next; free_pages((unsigned long)batch, 0); @@ -363,6 +376,8 @@ void tlb_remove_table(struct mmu_gather *tlb, void *tab= le) { struct mmu_table_batch **batch =3D &tlb->batch; =20 + VM_WARN_ON(arch_in_lazy_mmu_mode()); + if (*batch =3D=3D NULL) { *batch =3D (struct mmu_table_batch *)__get_free_page(GFP_NOWAIT | __GFP_= NOWARN); if (*batch =3D=3D NULL) { --=20 2.43.0 From nobody Fri Oct 31 11:33:01 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id AD78622F74F; Fri, 30 May 2025 14:05:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1748613939; cv=none; b=PcC3EqsG9q4YKHZMdd2N+N24GGni7k92Awfv/jpS/vx5JPJ4sHYdvfSP4vWwzc/+awNIqhYa79rrk1gFkMHM+DARsT0LBr4i1v88B59H8tpRX8PJkVPDkyYkPnXQLqaxvEhnksGr0oJS2qu3Wr+I5psD0cytQETSCMyHaPKsEzc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1748613939; c=relaxed/simple; bh=qsOgYqoYs/02cGO5GXiPh+yC5X0DIeEw7NDVvR/N6fs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=f3iGsddEfME4bih5J7Nsocb7H2M4RNBMOpC3ZuYj2aA28cqUxQLjiBztZee1wUhPpZAMdL1/3GhcTXcDoTobrb1ATDw2MK6aGqkS4CW7SQz1g6QajT+stNyqdph664OQEdFl3NlxQzpSRbJRdNH8Ze9FEottYjtenRE4/45STgU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8076E26AC; Fri, 30 May 2025 07:05:20 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id EAB693F673; Fri, 30 May 2025 07:05:31 -0700 (PDT) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Madhavan Srinivasan , Michael Ellerman , Nicholas Piggin , Christophe Leroy , "David S. Miller" , Andreas Larsson , Juergen Gross , Ajay Kaher , Alexey Makhalov , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Boris Ostrovsky , "Aneesh Kumar K.V" , Andrew Morton , Peter Zijlstra , Arnd Bergmann , David Hildenbrand , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Alexei Starovoitov , Andrey Ryabinin Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, sparclinux@vger.kernel.org, virtualization@lists.linux.dev, xen-devel@lists.xenproject.org, linux-mm@kvack.org Subject: [RFC PATCH v1 6/6] Revert "arm64/mm: Permit lazy_mmu_mode to be nested" Date: Fri, 30 May 2025 15:04:44 +0100 Message-ID: <20250530140446.2387131-7-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250530140446.2387131-1-ryan.roberts@arm.com> References: <20250530140446.2387131-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Commit 491344301b25 ("arm64/mm: Permit lazy_mmu_mode to be nested") made the arm64 implementation of lazy_mmu_mode tolerant to nesting. But subsequent commits have fixed the core code to ensure that lazy_mmu_mode never gets nested (as originally intended). Therefore we can revert this commit and reinstate the VM_WARN() if nesting is detected in future. Signed-off-by: Ryan Roberts --- arch/arm64/include/asm/pgtable.h | 14 ++------------ 1 file changed, 2 insertions(+), 12 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgta= ble.h index add75dee49f5..dcf0adbeb803 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -83,21 +83,11 @@ static inline void queue_pte_barriers(void) #define __HAVE_ARCH_ENTER_LAZY_MMU_MODE static inline void arch_enter_lazy_mmu_mode(void) { - /* - * lazy_mmu_mode is not supposed to permit nesting. But in practice this - * does happen with CONFIG_DEBUG_PAGEALLOC, where a page allocation - * inside a lazy_mmu_mode section (such as zap_pte_range()) will change - * permissions on the linear map with apply_to_page_range(), which - * re-enters lazy_mmu_mode. So we tolerate nesting in our - * implementation. The first call to arch_leave_lazy_mmu_mode() will - * flush and clear the flag such that the remainder of the work in the - * outer nest behaves as if outside of lazy mmu mode. This is safe and - * keeps tracking simple. - */ - if (in_interrupt()) return; =20 + VM_WARN_ON(test_thread_flag(TIF_LAZY_MMU)); + set_thread_flag(TIF_LAZY_MMU); } =20 --=20 2.43.0