From nobody Sun Dec 14 21:33:56 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 19EBC1662EF; Wed, 5 Feb 2025 15:10:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738768233; cv=none; b=Zr/H5jX0TCl9DBjhTHB+8ghVqmIblWeho0W7brChBFN39Fvbk4G9q5Gph9GWsZPZDGtdfmtGlIhyUdNsHEQtrC46+5iNKglsGEV+X0EQvQzLMR1OqvkvNGFbupzyGNfBMIJO3/s0/EVa2utON9L4I7WInReYVqNiKUEr1WKE2hc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738768233; c=relaxed/simple; bh=0I1T16cHR3wAmg8N9hea5o9OSv+qSCJxkxldhLQxsY4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=jI3pAqLayFh57K7RRagi+OGy02Phss8QzOItH/O15jWpCF4BjseiTmvvQbWlugz5Hkza5qGd18Ch2aI/Os4jXe10li6hFJOYQcQ78iGrnOxBONTLXulHBt2cA8PYVuWs6f+dli3zAko54TH2CrqO9At2pvBCDfZ7XO3ObY5TmiM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 84921165C; Wed, 5 Feb 2025 07:10:53 -0800 (PST) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 438EB3F5A1; Wed, 5 Feb 2025 07:10:25 -0800 (PST) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Muchun Song , Pasha Tatashin , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Mark Rutland , Ard Biesheuvel , Anshuman Khandual , Dev Jain , Alexandre Ghiti , Steve Capper , Kevin Brodsky Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Huacai Chen , Thomas Bogendoerfer , "James E.J. Bottomley" , Helge Deller , Madhavan Srinivasan , Michael Ellerman , Paul Walmsley , Palmer Dabbelt , Albert Ou , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Gerald Schaefer , "David S. Miller" , Andreas Larsson , stable@vger.kernel.org Subject: [PATCH v1 01/16] mm: hugetlb: Add huge page size param to huge_ptep_get_and_clear() Date: Wed, 5 Feb 2025 15:09:41 +0000 Message-ID: <20250205151003.88959-2-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250205151003.88959-1-ryan.roberts@arm.com> References: <20250205151003.88959-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" In order to fix a bug, arm64 needs to be told the size of the huge page for which the huge_pte is being set in huge_ptep_get_and_clear(). Provide for this by adding an `unsigned long sz` parameter to the function. This follows the same pattern as huge_pte_clear() and set_huge_pte_at(). This commit makes the required interface modifications to the core mm as well as all arches that implement this function (arm64, loongarch, mips, parisc, powerpc, riscv, s390, sparc). The actual arm64 bug will be fixed in a separate commit. Cc: Fixes: 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit") Signed-off-by: Ryan Roberts --- arch/arm64/include/asm/hugetlb.h | 4 ++-- arch/arm64/mm/hugetlbpage.c | 8 +++++--- arch/loongarch/include/asm/hugetlb.h | 6 ++++-- arch/mips/include/asm/hugetlb.h | 6 ++++-- arch/parisc/include/asm/hugetlb.h | 2 +- arch/parisc/mm/hugetlbpage.c | 2 +- arch/powerpc/include/asm/hugetlb.h | 6 ++++-- arch/riscv/include/asm/hugetlb.h | 3 ++- arch/riscv/mm/hugetlbpage.c | 2 +- arch/s390/include/asm/hugetlb.h | 12 ++++++++---- arch/s390/mm/hugetlbpage.c | 10 ++++++++-- arch/sparc/include/asm/hugetlb.h | 2 +- arch/sparc/mm/hugetlbpage.c | 2 +- include/asm-generic/hugetlb.h | 2 +- include/linux/hugetlb.h | 4 +++- mm/hugetlb.c | 4 ++-- 16 files changed, 48 insertions(+), 27 deletions(-) diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/huge= tlb.h index c6dff3e69539..03db9cb21ace 100644 --- a/arch/arm64/include/asm/hugetlb.h +++ b/arch/arm64/include/asm/hugetlb.h @@ -42,8 +42,8 @@ extern int huge_ptep_set_access_flags(struct vm_area_stru= ct *vma, unsigned long addr, pte_t *ptep, pte_t pte, int dirty); #define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR -extern pte_t huge_ptep_get_and_clear(struct mm_struct *mm, - unsigned long addr, pte_t *ptep); +extern pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long a= ddr, + pte_t *ptep, unsigned long sz); #define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT extern void huge_ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep); diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 98a2a0e64e25..06db4649af91 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -396,8 +396,8 @@ void huge_pte_clear(struct mm_struct *mm, unsigned long= addr, __pte_clear(mm, addr, ptep); } -pte_t huge_ptep_get_and_clear(struct mm_struct *mm, - unsigned long addr, pte_t *ptep) +pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, unsigned long sz) { int ncontig; size_t pgsize; @@ -549,6 +549,8 @@ bool __init arch_hugetlb_valid_size(unsigned long size) pte_t huge_ptep_modify_prot_start(struct vm_area_struct *vma, unsigned lon= g addr, pte_t *ptep) { + unsigned long psize =3D huge_page_size(hstate_vma(vma)); + if (alternative_has_cap_unlikely(ARM64_WORKAROUND_2645198)) { /* * Break-before-make (BBM) is required for all user space mappings @@ -558,7 +560,7 @@ pte_t huge_ptep_modify_prot_start(struct vm_area_struct= *vma, unsigned long addr if (pte_user_exec(__ptep_get(ptep))) return huge_ptep_clear_flush(vma, addr, ptep); } - return huge_ptep_get_and_clear(vma->vm_mm, addr, ptep); + return huge_ptep_get_and_clear(vma->vm_mm, addr, ptep, psize); } void huge_ptep_modify_prot_commit(struct vm_area_struct *vma, unsigned lon= g addr, pte_t *ptep, diff --git a/arch/loongarch/include/asm/hugetlb.h b/arch/loongarch/include/= asm/hugetlb.h index c8e4057734d0..4dc4b3e04225 100644 --- a/arch/loongarch/include/asm/hugetlb.h +++ b/arch/loongarch/include/asm/hugetlb.h @@ -36,7 +36,8 @@ static inline void huge_pte_clear(struct mm_struct *mm, u= nsigned long addr, #define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm, - unsigned long addr, pte_t *ptep) + unsigned long addr, pte_t *ptep, + unsigned long sz) { pte_t clear; pte_t pte =3D ptep_get(ptep); @@ -51,8 +52,9 @@ static inline pte_t huge_ptep_clear_flush(struct vm_area_= struct *vma, unsigned long addr, pte_t *ptep) { pte_t pte; + unsigned long sz =3D huge_page_size(hstate_vma(vma)); - pte =3D huge_ptep_get_and_clear(vma->vm_mm, addr, ptep); + pte =3D huge_ptep_get_and_clear(vma->vm_mm, addr, ptep, sz); flush_tlb_page(vma, addr); return pte; } diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetl= b.h index d0a86ce83de9..fbc71ddcf0f6 100644 --- a/arch/mips/include/asm/hugetlb.h +++ b/arch/mips/include/asm/hugetlb.h @@ -27,7 +27,8 @@ static inline int prepare_hugepage_range(struct file *fil= e, #define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm, - unsigned long addr, pte_t *ptep) + unsigned long addr, pte_t *ptep, + unsigned long sz) { pte_t clear; pte_t pte =3D *ptep; @@ -42,13 +43,14 @@ static inline pte_t huge_ptep_clear_flush(struct vm_are= a_struct *vma, unsigned long addr, pte_t *ptep) { pte_t pte; + unsigned long sz =3D huge_page_size(hstate_vma(vma)); /* * clear the huge pte entry firstly, so that the other smp threads will * not get old pte entry after finishing flush_tlb_page and before * setting new huge pte entry */ - pte =3D huge_ptep_get_and_clear(vma->vm_mm, addr, ptep); + pte =3D huge_ptep_get_and_clear(vma->vm_mm, addr, ptep, sz); flush_tlb_page(vma, addr); return pte; } diff --git a/arch/parisc/include/asm/hugetlb.h b/arch/parisc/include/asm/hu= getlb.h index 5b3a5429f71b..21e9ace17739 100644 --- a/arch/parisc/include/asm/hugetlb.h +++ b/arch/parisc/include/asm/hugetlb.h @@ -10,7 +10,7 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long = addr, #define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, - pte_t *ptep); + pte_t *ptep, unsigned long sz); #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, diff --git a/arch/parisc/mm/hugetlbpage.c b/arch/parisc/mm/hugetlbpage.c index e9d18cf25b79..a94fe546d434 100644 --- a/arch/parisc/mm/hugetlbpage.c +++ b/arch/parisc/mm/hugetlbpage.c @@ -126,7 +126,7 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned lon= g addr, pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, - pte_t *ptep) + pte_t *ptep, unsigned long sz) { pte_t entry; diff --git a/arch/powerpc/include/asm/hugetlb.h b/arch/powerpc/include/asm/= hugetlb.h index dad2e7980f24..86326587e58d 100644 --- a/arch/powerpc/include/asm/hugetlb.h +++ b/arch/powerpc/include/asm/hugetlb.h @@ -45,7 +45,8 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned long = addr, pte_t *ptep, #define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm, - unsigned long addr, pte_t *ptep) + unsigned long addr, pte_t *ptep, + unsigned long sz) { return __pte(pte_update(mm, addr, ptep, ~0UL, 0, 1)); } @@ -55,8 +56,9 @@ static inline pte_t huge_ptep_clear_flush(struct vm_area_= struct *vma, unsigned long addr, pte_t *ptep) { pte_t pte; + unsigned long sz =3D huge_page_size(hstate_vma(vma)); - pte =3D huge_ptep_get_and_clear(vma->vm_mm, addr, ptep); + pte =3D huge_ptep_get_and_clear(vma->vm_mm, addr, ptep, sz); flush_hugetlb_page(vma, addr); return pte; } diff --git a/arch/riscv/include/asm/hugetlb.h b/arch/riscv/include/asm/huge= tlb.h index faf3624d8057..446126497768 100644 --- a/arch/riscv/include/asm/hugetlb.h +++ b/arch/riscv/include/asm/hugetlb.h @@ -28,7 +28,8 @@ void set_huge_pte_at(struct mm_struct *mm, #define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR pte_t huge_ptep_get_and_clear(struct mm_struct *mm, - unsigned long addr, pte_t *ptep); + unsigned long addr, pte_t *ptep, + unsigned long sz); #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c index 42314f093922..b4a78a4b35cf 100644 --- a/arch/riscv/mm/hugetlbpage.c +++ b/arch/riscv/mm/hugetlbpage.c @@ -293,7 +293,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *v= ma, pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, - pte_t *ptep) + pte_t *ptep, unsigned long sz) { pte_t orig_pte =3D ptep_get(ptep); int pte_num; diff --git a/arch/s390/include/asm/hugetlb.h b/arch/s390/include/asm/hugetl= b.h index 7c52acaf9f82..420c74306779 100644 --- a/arch/s390/include/asm/hugetlb.h +++ b/arch/s390/include/asm/hugetlb.h @@ -26,7 +26,11 @@ void __set_huge_pte_at(struct mm_struct *mm, unsigned lo= ng addr, pte_t huge_ptep_get(struct mm_struct *mm, unsigned long addr, pte_t *ptep); #define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR -pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, pt= e_t *ptep); +pte_t huge_ptep_get_and_clear(struct mm_struct *mm, + unsigned long addr, pte_t *ptep, + unsigned long sz); +pte_t __huge_ptep_get_and_clear(struct mm_struct *mm, + unsigned long addr, pte_t *ptep); static inline void arch_clear_hugetlb_flags(struct folio *folio) { @@ -48,7 +52,7 @@ static inline void huge_pte_clear(struct mm_struct *mm, u= nsigned long addr, static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, unsigned long address, pte_t *ptep) { - return huge_ptep_get_and_clear(vma->vm_mm, address, ptep); + return __huge_ptep_get_and_clear(vma->vm_mm, address, ptep); } #define __HAVE_ARCH_HUGE_PTEP_SET_ACCESS_FLAGS @@ -59,7 +63,7 @@ static inline int huge_ptep_set_access_flags(struct vm_ar= ea_struct *vma, int changed =3D !pte_same(huge_ptep_get(vma->vm_mm, addr, ptep), pte); if (changed) { - huge_ptep_get_and_clear(vma->vm_mm, addr, ptep); + __huge_ptep_get_and_clear(vma->vm_mm, addr, ptep); __set_huge_pte_at(vma->vm_mm, addr, ptep, pte); } return changed; @@ -69,7 +73,7 @@ static inline int huge_ptep_set_access_flags(struct vm_ar= ea_struct *vma, static inline void huge_ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { - pte_t pte =3D huge_ptep_get_and_clear(mm, addr, ptep); + pte_t pte =3D __huge_ptep_get_and_clear(mm, addr, ptep); __set_huge_pte_at(mm, addr, ptep, pte_wrprotect(pte)); } diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c index d9ce199953de..52ee8e854195 100644 --- a/arch/s390/mm/hugetlbpage.c +++ b/arch/s390/mm/hugetlbpage.c @@ -188,8 +188,8 @@ pte_t huge_ptep_get(struct mm_struct *mm, unsigned long= addr, pte_t *ptep) return __rste_to_pte(pte_val(*ptep)); } -pte_t huge_ptep_get_and_clear(struct mm_struct *mm, - unsigned long addr, pte_t *ptep) +pte_t __huge_ptep_get_and_clear(struct mm_struct *mm, + unsigned long addr, pte_t *ptep) { pte_t pte =3D huge_ptep_get(mm, addr, ptep); pmd_t *pmdp =3D (pmd_t *) ptep; @@ -202,6 +202,12 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm, return pte; } +pte_t huge_ptep_get_and_clear(struct mm_struct *mm, + unsigned long addr, pte_t *ptep, unsigned long sz) +{ + return __huge_ptep_get_and_clear(mm, addr, ptep); +} + pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz) { diff --git a/arch/sparc/include/asm/hugetlb.h b/arch/sparc/include/asm/huge= tlb.h index c714ca6a05aa..e7a9cdd498dc 100644 --- a/arch/sparc/include/asm/hugetlb.h +++ b/arch/sparc/include/asm/hugetlb.h @@ -20,7 +20,7 @@ void __set_huge_pte_at(struct mm_struct *mm, unsigned lon= g addr, #define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, - pte_t *ptep); + pte_t *ptep, unsigned long sz); #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c index eee601a0d2cf..80504148d8a5 100644 --- a/arch/sparc/mm/hugetlbpage.c +++ b/arch/sparc/mm/hugetlbpage.c @@ -260,7 +260,7 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned lon= g addr, } pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, - pte_t *ptep) + pte_t *ptep, unsigned long sz) { unsigned int i, nptes, orig_shift, shift; unsigned long size; diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h index f42133dae68e..2afc95bf1655 100644 --- a/include/asm-generic/hugetlb.h +++ b/include/asm-generic/hugetlb.h @@ -90,7 +90,7 @@ static inline void set_huge_pte_at(struct mm_struct *mm, = unsigned long addr, #ifndef __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm, - unsigned long addr, pte_t *ptep) + unsigned long addr, pte_t *ptep, unsigned long sz) { return ptep_get_and_clear(mm, addr, ptep); } diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index ec8c0ccc8f95..bf5f7256bd28 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -1004,7 +1004,9 @@ static inline void hugetlb_count_sub(long l, struct m= m_struct *mm) static inline pte_t huge_ptep_modify_prot_start(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) { - return huge_ptep_get_and_clear(vma->vm_mm, addr, ptep); + unsigned long psize =3D huge_page_size(hstate_vma(vma)); + + return huge_ptep_get_and_clear(vma->vm_mm, addr, ptep, psize); } #endif diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 65068671e460..de9d49e521c1 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5447,7 +5447,7 @@ static void move_huge_pte(struct vm_area_struct *vma,= unsigned long old_addr, if (src_ptl !=3D dst_ptl) spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); - pte =3D huge_ptep_get_and_clear(mm, old_addr, src_pte); + pte =3D huge_ptep_get_and_clear(mm, old_addr, src_pte, sz); if (need_clear_uffd_wp && pte_marker_uffd_wp(pte)) huge_pte_clear(mm, new_addr, dst_pte, sz); @@ -5622,7 +5622,7 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, s= truct vm_area_struct *vma, set_vma_resv_flags(vma, HPAGE_RESV_UNMAPPED); } - pte =3D huge_ptep_get_and_clear(mm, address, ptep); + pte =3D huge_ptep_get_and_clear(mm, address, ptep, sz); tlb_remove_huge_tlb_entry(h, tlb, ptep, address); if (huge_pte_dirty(pte)) set_page_dirty(page); -- 2.43.0 From nobody Sun Dec 14 21:33:56 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id E563916DEB1; Wed, 5 Feb 2025 15:10:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738768235; cv=none; b=N/gb2DdMuVrx1zXA44fv+bXJsBlCOCR/0/KmDPxNEBb0yboq9v03tDDW7h7B4e8xChy23GG1ED7FjrPC2lPyOzU+CkiRYC57apLNQ95p3Y3zeDxGqBiieFJDQePTptkTuqEqBWtbhhQdHKOB2kIQWzX4uG+1rybsRhUPBLK1ocE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738768235; c=relaxed/simple; bh=KL70WfIvZ+0KTsJ2YsHRotTUE/Q1P5EP4fGfxVWzfCs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=cef/ClM7gMGbLqNnD52KaiLYjle+H84J2ET9BdYUpt+PQrGs5sQ0Rf3g690sCF5+0yNAeAVkppqoK7Whn+ucR8LdrDZ/90awz8fZYjfyiuDnMAjQ1pU+0NFgH9A0RdBq7et18UOoYLfOpm6O4pE+o/mrvl8srdD23KdoHXicHSk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 71E221063; Wed, 5 Feb 2025 07:10:56 -0800 (PST) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 36B093F5A1; Wed, 5 Feb 2025 07:10:30 -0800 (PST) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Muchun Song , Pasha Tatashin , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Mark Rutland , Ard Biesheuvel , Anshuman Khandual , Dev Jain , Alexandre Ghiti , Steve Capper , Kevin Brodsky Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org Subject: [PATCH v1 02/16] arm64: hugetlb: Fix huge_ptep_get_and_clear() for non-present ptes Date: Wed, 5 Feb 2025 15:09:42 +0000 Message-ID: <20250205151003.88959-3-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250205151003.88959-1-ryan.roberts@arm.com> References: <20250205151003.88959-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" arm64 supports multiple huge_pte sizes. Some of the sizes are covered by a single pte entry at a particular level (PMD_SIZE, PUD_SIZE), and some are covered by multiple ptes at a particular level (CONT_PTE_SIZE, CONT_PMD_SIZE). So the function has to figure out the size from the huge_pte pointer. This was previously done by walking the pgtable to determine the level, then using the PTE_CONT bit to determine the number of ptes. But the PTE_CONT bit is only valid when the pte is present. For non-present pte values (e.g. markers, migration entries), the previous implementation was therefore erroniously determining the size. There is at least one known caller in core-mm, move_huge_pte(), which may call huge_ptep_get_and_clear() for a non-present pte. So we must be robust to this case. Additionally the "regular" ptep_get_and_clear() is robust to being called for non-present ptes so it makes sense to follow the behaviour. Fix this by using the new sz parameter which is now provided to the function. Additionally when clearing each pte in a contig range, don't gather the access and dirty bits if the pte is not present. An alternative approach that would not require API changes would be to store the PTE_CONT bit in a spare bit in the swap entry pte. But it felt cleaner to follow other APIs' lead and just pass in the size. While we are at it, add some debug warnings in functions that require the pte is present. As an aside, PTE_CONT is bit 52, which corresponds to bit 40 in the swap entry offset field (layout of non-present pte). Since hugetlb is never swapped to disk, this field will only be populated for markers, which always set this bit to 0 and hwpoison swap entries, which set the offset field to a PFN; So it would only ever be 1 for a 52-bit PVA system where memory in that high half was poisoned (I think!). So in practice, this bit would almost always be zero for non-present ptes and we would only clear the first entry if it was actually a contiguous block. That's probably a less severe symptom than if it was always interpretted as 1 and cleared out potentially-present neighboring PTEs. Cc: Fixes: 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit") Signed-off-by: Ryan Roberts --- arch/arm64/mm/hugetlbpage.c | 54 ++++++++++++++++++++----------------- 1 file changed, 29 insertions(+), 25 deletions(-) diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 06db4649af91..328eec4bfe55 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -163,24 +163,23 @@ static pte_t get_clear_contig(struct mm_struct *mm, unsigned long pgsize, unsigned long ncontig) { - pte_t orig_pte =3D __ptep_get(ptep); - unsigned long i; - - for (i =3D 0; i < ncontig; i++, addr +=3D pgsize, ptep++) { - pte_t pte =3D __ptep_get_and_clear(mm, addr, ptep); - - /* - * If HW_AFDBM is enabled, then the HW could turn on - * the dirty or accessed bit for any page in the set, - * so check them all. - */ - if (pte_dirty(pte)) - orig_pte =3D pte_mkdirty(orig_pte); - - if (pte_young(pte)) - orig_pte =3D pte_mkyoung(orig_pte); + pte_t pte, tmp_pte; + bool present; + + pte =3D __ptep_get_and_clear(mm, addr, ptep); + present =3D pte_present(pte); + while (--ncontig) { + ptep++; + addr +=3D pgsize; + tmp_pte =3D __ptep_get_and_clear(mm, addr, ptep); + if (present) { + if (pte_dirty(tmp_pte)) + pte =3D pte_mkdirty(pte); + if (pte_young(tmp_pte)) + pte =3D pte_mkyoung(pte); + } } - return orig_pte; + return pte; } =20 static pte_t get_clear_contig_flush(struct mm_struct *mm, @@ -401,13 +400,8 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm, un= signed long addr, { int ncontig; size_t pgsize; - pte_t orig_pte =3D __ptep_get(ptep); - - if (!pte_cont(orig_pte)) - return __ptep_get_and_clear(mm, addr, ptep); - - ncontig =3D find_num_contig(mm, addr, ptep, &pgsize); =20 + ncontig =3D num_contig_ptes(sz, &pgsize); return get_clear_contig(mm, addr, ptep, pgsize, ncontig); } =20 @@ -451,6 +445,8 @@ int huge_ptep_set_access_flags(struct vm_area_struct *v= ma, pgprot_t hugeprot; pte_t orig_pte; =20 + VM_WARN_ON(!pte_present(pte)); + if (!pte_cont(pte)) return __ptep_set_access_flags(vma, addr, ptep, pte, dirty); =20 @@ -461,6 +457,7 @@ int huge_ptep_set_access_flags(struct vm_area_struct *v= ma, return 0; =20 orig_pte =3D get_clear_contig_flush(mm, addr, ptep, pgsize, ncontig); + VM_WARN_ON(!pte_present(orig_pte)); =20 /* Make sure we don't lose the dirty or young state */ if (pte_dirty(orig_pte)) @@ -485,7 +482,10 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm, size_t pgsize; pte_t pte; =20 - if (!pte_cont(__ptep_get(ptep))) { + pte =3D __ptep_get(ptep); + VM_WARN_ON(!pte_present(pte)); + + if (!pte_cont(pte)) { __ptep_set_wrprotect(mm, addr, ptep); return; } @@ -509,8 +509,12 @@ pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, struct mm_struct *mm =3D vma->vm_mm; size_t pgsize; int ncontig; + pte_t pte; =20 - if (!pte_cont(__ptep_get(ptep))) + pte =3D __ptep_get(ptep); + VM_WARN_ON(!pte_present(pte)); + + if (!pte_cont(pte)) return ptep_clear_flush(vma, addr, ptep); =20 ncontig =3D find_num_contig(mm, addr, ptep, &pgsize); --=20 2.43.0 From nobody Sun Dec 14 21:33:56 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 9C243188704; Wed, 5 Feb 2025 15:10:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738768238; cv=none; b=GQRipW5+Q4kotHR4ffxHTjpioqucGBZeKQ8AVIMys8xBNJWsGNg/hm+ivr/uF9+62sTjAMzwmHX5mkOk3uiUbX42hSsqwGT5ObNsCUTPFUZj4Em2AmpKmu6NTq/LHgXZKUUuTDBM4mOWqLKXUV4HQIFHhn93UdlYMOzYcJCkQxQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738768238; c=relaxed/simple; bh=3Q54yKXRdvqhDGPtg9zs6/cCiSVCV9GAhtafDq5qpLw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=RpP0Arv/2+Zob2lVoKX/H4Aw9XpLyqa9Wwteu6xp8XJGUFiEkOiXaEYFAiVFdCuVnieZ4J85GZ023XfPXKMVHgSPJjyxnBIVseDQ5iXn2OrNEAepbYvMZiY0r1njFB5BUnEc6rYlCFk09nkCEooevjk+lrKukr1J60tX7NZ1T9w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 5FB141C01; Wed, 5 Feb 2025 07:10:59 -0800 (PST) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 242593F5A1; Wed, 5 Feb 2025 07:10:33 -0800 (PST) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Muchun Song , Pasha Tatashin , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Mark Rutland , Ard Biesheuvel , Anshuman Khandual , Dev Jain , Alexandre Ghiti , Steve Capper , Kevin Brodsky Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org Subject: [PATCH v1 03/16] arm64: hugetlb: Fix flush_hugetlb_tlb_range() invalidation level Date: Wed, 5 Feb 2025 15:09:43 +0000 Message-ID: <20250205151003.88959-4-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250205151003.88959-1-ryan.roberts@arm.com> References: <20250205151003.88959-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" commit c910f2b65518 ("arm64/mm: Update tlb invalidation routines for FEAT_LPA2") changed the "invalidation level unknown" hint from 0 to TLBI_TTL_UNKNOWN (INT_MAX). But the fallback "unknown level" path in flush_hugetlb_tlb_range() was not updated. So as it stands, when trying to invalidate CONT_PMD_SIZE or CONT_PTE_SIZE hugetlb mappings, we will spuriously try to invalidate at level 0 on LPA2-enabled systems. Fix this so that the fallback passes TLBI_TTL_UNKNOWN, and while we are at it, explicitly use the correct stride and level for CONT_PMD_SIZE and CONT_PTE_SIZE, which should provide a minor optimization. Cc: Fixes: c910f2b65518 ("arm64/mm: Update tlb invalidation routines for FEAT_L= PA2") Signed-off-by: Ryan Roberts --- arch/arm64/include/asm/hugetlb.h | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/huge= tlb.h index 03db9cb21ace..8ab9542d2d22 100644 --- a/arch/arm64/include/asm/hugetlb.h +++ b/arch/arm64/include/asm/hugetlb.h @@ -76,12 +76,20 @@ static inline void flush_hugetlb_tlb_range(struct vm_ar= ea_struct *vma, { unsigned long stride =3D huge_page_size(hstate_vma(vma)); =20 - if (stride =3D=3D PMD_SIZE) - __flush_tlb_range(vma, start, end, stride, false, 2); - else if (stride =3D=3D PUD_SIZE) - __flush_tlb_range(vma, start, end, stride, false, 1); - else - __flush_tlb_range(vma, start, end, PAGE_SIZE, false, 0); + switch (stride) { + case PUD_SIZE: + __flush_tlb_range(vma, start, end, PUD_SIZE, false, 1); + break; + case CONT_PMD_SIZE: + case PMD_SIZE: + __flush_tlb_range(vma, start, end, PMD_SIZE, false, 2); + break; + case CONT_PTE_SIZE: + __flush_tlb_range(vma, start, end, PAGE_SIZE, false, 3); + break; + default: + __flush_tlb_range(vma, start, end, PAGE_SIZE, false, TLBI_TTL_UNKNOWN); + } } =20 #endif /* __ASM_HUGETLB_H */ --=20 2.43.0 From nobody Sun Dec 14 21:33:56 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 3F89718D649 for ; Wed, 5 Feb 2025 15:10:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738768240; cv=none; b=O+zOxkAy8N+lQHPSlDThDvbXUuXC4Ci21y+VsgzS1PkgUq1oaRGdsWJM9+cwu5gPkbFV7V0+L8BorHW4jh3zeI7TwVBF/VRpw8O3n3+JuR4/XI+nm4SfoBawFs8R7UyYbTWNxJeUoFlkngwO+vlHQqujglr79igliCTmLehnVoE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738768240; c=relaxed/simple; bh=nG0dEu8OskEFWpJvNUYYdqhbCpLn3sLULzDXgSAaafw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=WWmJ2/8OBYKWZyZ0ljm9BgrWYeAdbHpEZ4F85WsK8r3AySb9qf+kj+OpsKFOpCkoXZPBAt9ZS0tUDtKDAhwIIlmVDQNCwdhwWV8GT/2rXA4PKJVvzYROTc5SdekHBVtmEbkCXb8cKuRc01K4cnoQVuDZ7AkErHGMuqR2AFU4k4c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 2DD511515; Wed, 5 Feb 2025 07:11:02 -0800 (PST) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 11E403F5A1; Wed, 5 Feb 2025 07:10:35 -0800 (PST) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Muchun Song , Pasha Tatashin , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Mark Rutland , Ard Biesheuvel , Anshuman Khandual , Dev Jain , Alexandre Ghiti , Steve Capper , Kevin Brodsky Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v1 04/16] arm64: hugetlb: Refine tlb maintenance scope Date: Wed, 5 Feb 2025 15:09:44 +0000 Message-ID: <20250205151003.88959-5-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250205151003.88959-1-ryan.roberts@arm.com> References: <20250205151003.88959-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When operating on contiguous blocks of ptes (or pmds) for some hugetlb sizes, we must honour break-before-make requirements and clear down the block to invalid state in the pgtable then invalidate the relevant tlb entries before making the pgtable entries valid again. However, the tlb maintenance is currently always done assuming the worst case stride (PAGE_SIZE), last_level (false) and tlb_level (TLBI_TTL_UNKNOWN). We can do much better with the hinting; In reality, we know the stride from the huge_pte pgsize, we are always operating only on the last level, and we always know the tlb_level, again based on pgsize. So let's start providing these hints. Additionally, avoid tlb maintenace in set_huge_pte_at(). Break-before-make is only required if we are transitioning the contiguous pte block from valid -> valid. So let's elide the clear-and-flush ("break") if the pte range was previously invalid. Signed-off-by: Ryan Roberts --- arch/arm64/include/asm/hugetlb.h | 29 +++++++++++++++++++---------- arch/arm64/mm/hugetlbpage.c | 9 ++++++--- 2 files changed, 25 insertions(+), 13 deletions(-) diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/huge= tlb.h index 8ab9542d2d22..c38f2944c20d 100644 --- a/arch/arm64/include/asm/hugetlb.h +++ b/arch/arm64/include/asm/hugetlb.h @@ -69,27 +69,36 @@ extern void huge_ptep_modify_prot_commit(struct vm_area= _struct *vma, =20 #include =20 -#define __HAVE_ARCH_FLUSH_HUGETLB_TLB_RANGE -static inline void flush_hugetlb_tlb_range(struct vm_area_struct *vma, - unsigned long start, - unsigned long end) +static inline void __flush_hugetlb_tlb_range(struct vm_area_struct *vma, + unsigned long start, + unsigned long end, + unsigned long stride, + bool last_level) { - unsigned long stride =3D huge_page_size(hstate_vma(vma)); - switch (stride) { case PUD_SIZE: - __flush_tlb_range(vma, start, end, PUD_SIZE, false, 1); + __flush_tlb_range(vma, start, end, PUD_SIZE, last_level, 1); break; case CONT_PMD_SIZE: case PMD_SIZE: - __flush_tlb_range(vma, start, end, PMD_SIZE, false, 2); + __flush_tlb_range(vma, start, end, PMD_SIZE, last_level, 2); break; case CONT_PTE_SIZE: - __flush_tlb_range(vma, start, end, PAGE_SIZE, false, 3); + __flush_tlb_range(vma, start, end, PAGE_SIZE, last_level, 3); break; default: - __flush_tlb_range(vma, start, end, PAGE_SIZE, false, TLBI_TTL_UNKNOWN); + __flush_tlb_range(vma, start, end, PAGE_SIZE, last_level, TLBI_TTL_UNKNO= WN); } } =20 +#define __HAVE_ARCH_FLUSH_HUGETLB_TLB_RANGE +static inline void flush_hugetlb_tlb_range(struct vm_area_struct *vma, + unsigned long start, + unsigned long end) +{ + unsigned long stride =3D huge_page_size(hstate_vma(vma)); + + __flush_hugetlb_tlb_range(vma, start, end, stride, false); +} + #endif /* __ASM_HUGETLB_H */ diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 328eec4bfe55..e870d01d12ea 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -190,8 +190,9 @@ static pte_t get_clear_contig_flush(struct mm_struct *m= m, { pte_t orig_pte =3D get_clear_contig(mm, addr, ptep, pgsize, ncontig); struct vm_area_struct vma =3D TLB_FLUSH_VMA(mm, 0); + unsigned long end =3D addr + (pgsize * ncontig); =20 - flush_tlb_range(&vma, addr, addr + (pgsize * ncontig)); + __flush_hugetlb_tlb_range(&vma, addr, end, pgsize, true); return orig_pte; } =20 @@ -216,7 +217,7 @@ static void clear_flush(struct mm_struct *mm, for (i =3D 0; i < ncontig; i++, addr +=3D pgsize, ptep++) __ptep_get_and_clear(mm, addr, ptep); =20 - flush_tlb_range(&vma, saddr, addr); + __flush_hugetlb_tlb_range(&vma, saddr, addr, pgsize, true); } =20 void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, @@ -245,7 +246,9 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned lon= g addr, dpfn =3D pgsize >> PAGE_SHIFT; hugeprot =3D pte_pgprot(pte); =20 - clear_flush(mm, addr, ptep, pgsize, ncontig); + /* Only need to "break" if transitioning valid -> valid. */ + if (pte_valid(__ptep_get(ptep))) + clear_flush(mm, addr, ptep, pgsize, ncontig); =20 for (i =3D 0; i < ncontig; i++, ptep++, addr +=3D pgsize, pfn +=3D dpfn) __set_ptes(mm, addr, ptep, pfn_pte(pfn, hugeprot), 1); --=20 2.43.0 From nobody Sun Dec 14 21:33:56 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 04E13192D97 for ; Wed, 5 Feb 2025 15:10:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738768243; cv=none; b=O5nnLh5j/yejzXt81Q7SvrDjFiuHJH1lfEdO97QmDQNQO4agsXJgt2vJfwcPYDLIqRqyoQvKpGi8GVsD3BHifGoLUwTWq+b+JhHnjOynkDixuZG/1QdRNBEGwSZV7AGA3e1JiN7Yag6lc82DxZec2/hh8j8oTMGH8Pf9DXT6VqI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738768243; c=relaxed/simple; bh=hq3KIoaraouIbFieN8wCFefIKAKmlTqavX0k7E5xqOM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=TJwxobgzoq1POnImk21qu9sgUJh+8qlgMoB53nblROWjmTsWhwhtHi/C3OMoLQ53gtJ2FtJbCb0z1eFCJeWyMxnZRlNrtxrr/fe4z5a+GOHcsE6J2hcJLYNa8b6bOWxIzuVxu55r6I1YOvaeWN6DTk4Pj4BgKt8CZRzosP6vWBw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id F04381063; Wed, 5 Feb 2025 07:11:04 -0800 (PST) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id D46F73F5A1; Wed, 5 Feb 2025 07:10:38 -0800 (PST) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Muchun Song , Pasha Tatashin , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Mark Rutland , Ard Biesheuvel , Anshuman Khandual , Dev Jain , Alexandre Ghiti , Steve Capper , Kevin Brodsky Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v1 05/16] mm/page_table_check: Batch-check pmds/puds just like ptes Date: Wed, 5 Feb 2025 15:09:45 +0000 Message-ID: <20250205151003.88959-6-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250205151003.88959-1-ryan.roberts@arm.com> References: <20250205151003.88959-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Convert page_table_check_p[mu]d_set(...) to page_table_check_p[mu]ds_set(..., nr) to allow checking a contiguous set of pmds/puds in single batch. We retain page_table_check_p[mu]d_set(...) as macros that call new batch functions with nr=3D1 for compatibility. arm64 is about to reorganise its pte/pmd/pud helpers to reuse more code and to allow the implementation for huge_pte to more efficiently set ptes/pmds/puds in batches. We need these batch-helpers to make the refactoring possible. Signed-off-by: Ryan Roberts Reviewed-by: Anshuman Khandual --- include/linux/page_table_check.h | 30 +++++++++++++++++----------- mm/page_table_check.c | 34 +++++++++++++++++++------------- 2 files changed, 38 insertions(+), 26 deletions(-) diff --git a/include/linux/page_table_check.h b/include/linux/page_table_ch= eck.h index 6722941c7cb8..289620d4aad3 100644 --- a/include/linux/page_table_check.h +++ b/include/linux/page_table_check.h @@ -19,8 +19,10 @@ void __page_table_check_pmd_clear(struct mm_struct *mm, = pmd_t pmd); void __page_table_check_pud_clear(struct mm_struct *mm, pud_t pud); void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t = pte, unsigned int nr); -void __page_table_check_pmd_set(struct mm_struct *mm, pmd_t *pmdp, pmd_t p= md); -void __page_table_check_pud_set(struct mm_struct *mm, pud_t *pudp, pud_t p= ud); +void __page_table_check_pmds_set(struct mm_struct *mm, pmd_t *pmdp, pmd_t = pmd, + unsigned int nr); +void __page_table_check_puds_set(struct mm_struct *mm, pud_t *pudp, pud_t = pud, + unsigned int nr); void __page_table_check_pte_clear_range(struct mm_struct *mm, unsigned long addr, pmd_t pmd); @@ -74,22 +76,22 @@ static inline void page_table_check_ptes_set(struct mm_= struct *mm, __page_table_check_ptes_set(mm, ptep, pte, nr); } =20 -static inline void page_table_check_pmd_set(struct mm_struct *mm, pmd_t *p= mdp, - pmd_t pmd) +static inline void page_table_check_pmds_set(struct mm_struct *mm, + pmd_t *pmdp, pmd_t pmd, unsigned int nr) { if (static_branch_likely(&page_table_check_disabled)) return; =20 - __page_table_check_pmd_set(mm, pmdp, pmd); + __page_table_check_pmds_set(mm, pmdp, pmd, nr); } =20 -static inline void page_table_check_pud_set(struct mm_struct *mm, pud_t *p= udp, - pud_t pud) +static inline void page_table_check_puds_set(struct mm_struct *mm, + pud_t *pudp, pud_t pud, unsigned int nr) { if (static_branch_likely(&page_table_check_disabled)) return; =20 - __page_table_check_pud_set(mm, pudp, pud); + __page_table_check_puds_set(mm, pudp, pud, nr); } =20 static inline void page_table_check_pte_clear_range(struct mm_struct *mm, @@ -129,13 +131,13 @@ static inline void page_table_check_ptes_set(struct m= m_struct *mm, { } =20 -static inline void page_table_check_pmd_set(struct mm_struct *mm, pmd_t *p= mdp, - pmd_t pmd) +static inline void page_table_check_pmds_set(struct mm_struct *mm, + pmd_t *pmdp, pmd_t pmd, unsigned int nr) { } =20 -static inline void page_table_check_pud_set(struct mm_struct *mm, pud_t *p= udp, - pud_t pud) +static inline void page_table_check_puds_set(struct mm_struct *mm, + pud_t *pudp, pud_t pud, unsigned int nr) { } =20 @@ -146,4 +148,8 @@ static inline void page_table_check_pte_clear_range(str= uct mm_struct *mm, } =20 #endif /* CONFIG_PAGE_TABLE_CHECK */ + +#define page_table_check_pmd_set(mm, pmdp, pmd) page_table_check_pmds_set(= mm, pmdp, pmd, 1) +#define page_table_check_pud_set(mm, pudp, pud) page_table_check_puds_set(= mm, pudp, pud, 1) + #endif /* __LINUX_PAGE_TABLE_CHECK_H */ diff --git a/mm/page_table_check.c b/mm/page_table_check.c index 509c6ef8de40..dae4a7d776b3 100644 --- a/mm/page_table_check.c +++ b/mm/page_table_check.c @@ -234,33 +234,39 @@ static inline void page_table_check_pmd_flags(pmd_t p= md) WARN_ON_ONCE(swap_cached_writable(pmd_to_swp_entry(pmd))); } =20 -void __page_table_check_pmd_set(struct mm_struct *mm, pmd_t *pmdp, pmd_t p= md) +void __page_table_check_pmds_set(struct mm_struct *mm, pmd_t *pmdp, pmd_t = pmd, + unsigned int nr) { + unsigned int i; + unsigned long stride =3D PMD_SIZE >> PAGE_SHIFT; + if (&init_mm =3D=3D mm) return; =20 page_table_check_pmd_flags(pmd); =20 - __page_table_check_pmd_clear(mm, *pmdp); - if (pmd_user_accessible_page(pmd)) { - page_table_check_set(pmd_pfn(pmd), PMD_SIZE >> PAGE_SHIFT, - pmd_write(pmd)); - } + for (i =3D 0; i < nr; i++) + __page_table_check_pmd_clear(mm, *(pmdp + i)); + if (pmd_user_accessible_page(pmd)) + page_table_check_set(pmd_pfn(pmd), stride * nr, pmd_write(pmd)); } -EXPORT_SYMBOL(__page_table_check_pmd_set); +EXPORT_SYMBOL(__page_table_check_pmds_set); =20 -void __page_table_check_pud_set(struct mm_struct *mm, pud_t *pudp, pud_t p= ud) +void __page_table_check_puds_set(struct mm_struct *mm, pud_t *pudp, pud_t = pud, + unsigned int nr) { + unsigned int i; + unsigned long stride =3D PUD_SIZE >> PAGE_SHIFT; + if (&init_mm =3D=3D mm) return; =20 - __page_table_check_pud_clear(mm, *pudp); - if (pud_user_accessible_page(pud)) { - page_table_check_set(pud_pfn(pud), PUD_SIZE >> PAGE_SHIFT, - pud_write(pud)); - } + for (i =3D 0; i < nr; i++) + __page_table_check_pud_clear(mm, *(pudp + i)); + if (pud_user_accessible_page(pud)) + page_table_check_set(pud_pfn(pud), stride * nr, pud_write(pud)); } -EXPORT_SYMBOL(__page_table_check_pud_set); +EXPORT_SYMBOL(__page_table_check_puds_set); =20 void __page_table_check_pte_clear_range(struct mm_struct *mm, unsigned long addr, --=20 2.43.0 From nobody Sun Dec 14 21:33:56 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 80B4F19750B for ; Wed, 5 Feb 2025 15:10:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738768246; cv=none; b=RwRnhfZ7LbNKQ1U6D5kEWXAOQCc+f1z7rFvKIfHjgHT8WCKdkbAl4eIK7270ejF5OGcg9uo7uBcm2p64atHUgt7y8GnDGqipeltJEsgUuL6OP4Dxq2hriPkmfLeZfJMCV8qrE8HRbRkhg3ViuDHpEJafGNpAXvG1Uv2z/ho6ibg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738768246; c=relaxed/simple; bh=McSbjwKMv9JKAGvIDAcnu9DXXhsQsmXiIj4w4eA55x0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KjKjNmd4TyVfvuOBftsnkLId+N99prHkQA7/T3jyQLJxjW2T02v8D/+dfGfuqGVyKYZBCeO0amGGbHnsfe5+sz46cbJcounQe4DtEwFttLCRq0JN2d2rIEUi0KljKO3Il6R6IGVuoHoPoOIa2sskAnxSfziIKxkIS8515oBORL4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id BE7D41063; Wed, 5 Feb 2025 07:11:07 -0800 (PST) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id A24303F5A1; Wed, 5 Feb 2025 07:10:41 -0800 (PST) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Muchun Song , Pasha Tatashin , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Mark Rutland , Ard Biesheuvel , Anshuman Khandual , Dev Jain , Alexandre Ghiti , Steve Capper , Kevin Brodsky Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v1 06/16] arm64/mm: Refactor __set_ptes() and __ptep_get_and_clear() Date: Wed, 5 Feb 2025 15:09:46 +0000 Message-ID: <20250205151003.88959-7-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250205151003.88959-1-ryan.roberts@arm.com> References: <20250205151003.88959-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Refactor __set_ptes(), set_pmd_at() and set_pud_at() so that they are all a thin wrapper around a generic ___set_ptes(), which takes pgsize parameter. This cleans up the code to remove the confusing __set_pte_at() (which was only ever used for pmd/pud) and will allow us to perform future barrier optimizations in a single place. Additionally, it will permit the huge_pte API to efficiently batch-set pgtable entries and take advantage of the future barrier optimizations. ___set_ptes() calls the correct page_table_check_*_set() function based on the pgsize. This means that huge_ptes be able to get proper coverage regardless of their size, once it's plumbed into huge_pte. Currently the huge_pte API always uses the pte API which assumes an entry only covers a single page. While we are at it, refactor __ptep_get_and_clear() and pmdp_huge_get_and_clear() to use a common ___ptep_get_and_clear() which also takes a pgsize parameter. This will provide the huge_pte API the means to clear ptes corresponding with the way they were set. Signed-off-by: Ryan Roberts --- arch/arm64/include/asm/pgtable.h | 108 +++++++++++++++++++------------ 1 file changed, 67 insertions(+), 41 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgta= ble.h index 0b2a2ad1b9e8..3b55d9a15f05 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -420,23 +420,6 @@ static inline pte_t pte_advance_pfn(pte_t pte, unsigne= d long nr) return pfn_pte(pte_pfn(pte) + nr, pte_pgprot(pte)); } =20 -static inline void __set_ptes(struct mm_struct *mm, - unsigned long __always_unused addr, - pte_t *ptep, pte_t pte, unsigned int nr) -{ - page_table_check_ptes_set(mm, ptep, pte, nr); - __sync_cache_and_tags(pte, nr); - - for (;;) { - __check_safe_pte_update(mm, ptep, pte); - __set_pte(ptep, pte); - if (--nr =3D=3D 0) - break; - ptep++; - pte =3D pte_advance_pfn(pte, 1); - } -} - /* * Hugetlb definitions. */ @@ -641,30 +624,59 @@ static inline pgprot_t pud_pgprot(pud_t pud) return __pgprot(pud_val(pfn_pud(pfn, __pgprot(0))) ^ pud_val(pud)); } =20 -static inline void __set_pte_at(struct mm_struct *mm, - unsigned long __always_unused addr, - pte_t *ptep, pte_t pte, unsigned int nr) +static inline void ___set_ptes(struct mm_struct *mm, pte_t *ptep, pte_t pt= e, + unsigned int nr, unsigned long pgsize) { - __sync_cache_and_tags(pte, nr); - __check_safe_pte_update(mm, ptep, pte); - __set_pte(ptep, pte); + unsigned long stride =3D pgsize >> PAGE_SHIFT; + + switch (pgsize) { + case PAGE_SIZE: + page_table_check_ptes_set(mm, ptep, pte, nr); + break; + case PMD_SIZE: + page_table_check_pmds_set(mm, (pmd_t *)ptep, pte_pmd(pte), nr); + break; + case PUD_SIZE: + page_table_check_puds_set(mm, (pud_t *)ptep, pte_pud(pte), nr); + break; + default: + VM_WARN_ON(1); + } + + __sync_cache_and_tags(pte, nr * stride); + + for (;;) { + __check_safe_pte_update(mm, ptep, pte); + __set_pte(ptep, pte); + if (--nr =3D=3D 0) + break; + ptep++; + pte =3D pte_advance_pfn(pte, stride); + } } =20 -static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr, - pmd_t *pmdp, pmd_t pmd) +static inline void __set_ptes(struct mm_struct *mm, + unsigned long __always_unused addr, + pte_t *ptep, pte_t pte, unsigned int nr) { - page_table_check_pmd_set(mm, pmdp, pmd); - return __set_pte_at(mm, addr, (pte_t *)pmdp, pmd_pte(pmd), - PMD_SIZE >> PAGE_SHIFT); + ___set_ptes(mm, ptep, pte, nr, PAGE_SIZE); } =20 -static inline void set_pud_at(struct mm_struct *mm, unsigned long addr, - pud_t *pudp, pud_t pud) +static inline void __set_pmds(struct mm_struct *mm, + unsigned long __always_unused addr, + pmd_t *pmdp, pmd_t pmd, unsigned int nr) +{ + ___set_ptes(mm, (pte_t *)pmdp, pmd_pte(pmd), nr, PMD_SIZE); +} +#define set_pmd_at(mm, addr, pmdp, pmd) __set_pmds(mm, addr, pmdp, pmd, 1) + +static inline void __set_puds(struct mm_struct *mm, + unsigned long __always_unused addr, + pud_t *pudp, pud_t pud, unsigned int nr) { - page_table_check_pud_set(mm, pudp, pud); - return __set_pte_at(mm, addr, (pte_t *)pudp, pud_pte(pud), - PUD_SIZE >> PAGE_SHIFT); + ___set_ptes(mm, (pte_t *)pudp, pud_pte(pud), nr, PUD_SIZE); } +#define set_pud_at(mm, addr, pudp, pud) __set_puds(mm, addr, pudp, pud, 1) =20 #define __p4d_to_phys(p4d) __pte_to_phys(p4d_pte(p4d)) #define __phys_to_p4d_val(phys) __phys_to_pte_val(phys) @@ -1276,16 +1288,34 @@ static inline int pmdp_test_and_clear_young(struct = vm_area_struct *vma, } #endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG= */ =20 -static inline pte_t __ptep_get_and_clear(struct mm_struct *mm, - unsigned long address, pte_t *ptep) +static inline pte_t ___ptep_get_and_clear(struct mm_struct *mm, pte_t *pte= p, + unsigned long pgsize) { pte_t pte =3D __pte(xchg_relaxed(&pte_val(*ptep), 0)); =20 - page_table_check_pte_clear(mm, pte); + switch (pgsize) { + case PAGE_SIZE: + page_table_check_pte_clear(mm, pte); + break; + case PMD_SIZE: + page_table_check_pmd_clear(mm, pte_pmd(pte)); + break; + case PUD_SIZE: + page_table_check_pud_clear(mm, pte_pud(pte)); + break; + default: + VM_WARN_ON(1); + } =20 return pte; } =20 +static inline pte_t __ptep_get_and_clear(struct mm_struct *mm, + unsigned long address, pte_t *ptep) +{ + return ___ptep_get_and_clear(mm, ptep, PAGE_SIZE); +} + static inline void __clear_full_ptes(struct mm_struct *mm, unsigned long a= ddr, pte_t *ptep, unsigned int nr, int full) { @@ -1322,11 +1352,7 @@ static inline pte_t __get_and_clear_full_ptes(struct= mm_struct *mm, static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm, unsigned long address, pmd_t *pmdp) { - pmd_t pmd =3D __pmd(xchg_relaxed(&pmd_val(*pmdp), 0)); - - page_table_check_pmd_clear(mm, pmd); - - return pmd; + return pte_pmd(___ptep_get_and_clear(mm, (pte_t *)pmdp, PMD_SIZE)); } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ =20 --=20 2.43.0 From nobody Sun Dec 14 21:33:56 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 8A5CB13C8E2 for ; Wed, 5 Feb 2025 15:10:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738768249; cv=none; b=mM84xvb+eWKMx15w5066IudAJZa1hxy2iGxw69S0raheZ9smD/NpyREZRNK9GGrbA1cc3fSNe/bT8m3YxLlFCe6gCpwbE3TdWTsqq6D0iNyv1pVcjsYKKAK6O1YTLHkMx07by6k5vHMK71i/0eF7y1YnIm797oUQ0JGWlE3WkG0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738768249; c=relaxed/simple; bh=k4UhMEqh2Zcc7EaPxh/zwh/SlA+GiDiVu3n75+jtOHs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=cuYknXmDvlk2/0DPyueFhWQ8AOx48X48+0Uurz6odH2Cg8ODex8IK0M9PcE/Gq8jfKSNqlR2wyUrfGZiZJ9EgmSRId3HYFlhSbKKUnq4Deae1uUbjzZW2DV7f5aMKw1YoukbiXlSOeQznpf7CyBHnlvWqO/wPlimtICKFuEPEZc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 8D37F1063; Wed, 5 Feb 2025 07:11:10 -0800 (PST) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 712D43F5A1; Wed, 5 Feb 2025 07:10:44 -0800 (PST) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Muchun Song , Pasha Tatashin , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Mark Rutland , Ard Biesheuvel , Anshuman Khandual , Dev Jain , Alexandre Ghiti , Steve Capper , Kevin Brodsky Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v1 07/16] arm64: hugetlb: Use ___set_ptes() and ___ptep_get_and_clear() Date: Wed, 5 Feb 2025 15:09:47 +0000 Message-ID: <20250205151003.88959-8-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250205151003.88959-1-ryan.roberts@arm.com> References: <20250205151003.88959-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Refactor the huge_pte helpers to use the new generic ___set_ptes() and ___ptep_get_and_clear() APIs. This provides 2 benefits; First, when page_table_check=3Don, hugetlb is now properly/fully checked. Previously only the first page of a hugetlb folio was checked. Second, instead of having to call __set_ptes(nr=3D1) for each pte in a loop, the whole contiguous batch can now be set in one go, which enables some efficiencies and cleans up the code. One detail to note is that huge_ptep_clear_flush() was previously calling ptep_clear_flush() for a non-contiguous pte (i.e. a pud or pmd block mapping). This has a couple of disadvantages; first ptep_clear_flush() calls ptep_get_and_clear() which transparently handles contpte. Given we only call for non-contiguous ptes, it would be safe, but a waste of effort. It's preferable to go stright to the layer below. However, more problematic is that ptep_get_and_clear() is for PAGE_SIZE entries so it calls page_table_check_pte_clear() and would not clear the whole hugetlb folio. So let's stop special-casing the non-cont case and just rely on get_clear_contig_flush() to do the right thing for non-cont entries. Signed-off-by: Ryan Roberts --- arch/arm64/mm/hugetlbpage.c | 50 ++++++++----------------------------- 1 file changed, 11 insertions(+), 39 deletions(-) diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index e870d01d12ea..02afee31444e 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -166,12 +166,12 @@ static pte_t get_clear_contig(struct mm_struct *mm, pte_t pte, tmp_pte; bool present; =20 - pte =3D __ptep_get_and_clear(mm, addr, ptep); + pte =3D ___ptep_get_and_clear(mm, ptep, pgsize); present =3D pte_present(pte); while (--ncontig) { ptep++; addr +=3D pgsize; - tmp_pte =3D __ptep_get_and_clear(mm, addr, ptep); + tmp_pte =3D ___ptep_get_and_clear(mm, ptep, pgsize); if (present) { if (pte_dirty(tmp_pte)) pte =3D pte_mkdirty(pte); @@ -215,7 +215,7 @@ static void clear_flush(struct mm_struct *mm, unsigned long i, saddr =3D addr; =20 for (i =3D 0; i < ncontig; i++, addr +=3D pgsize, ptep++) - __ptep_get_and_clear(mm, addr, ptep); + ___ptep_get_and_clear(mm, ptep, pgsize); =20 __flush_hugetlb_tlb_range(&vma, saddr, addr, pgsize, true); } @@ -226,32 +226,20 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned l= ong addr, size_t pgsize; int i; int ncontig; - unsigned long pfn, dpfn; - pgprot_t hugeprot; =20 ncontig =3D num_contig_ptes(sz, &pgsize); =20 if (!pte_present(pte)) { for (i =3D 0; i < ncontig; i++, ptep++, addr +=3D pgsize) - __set_ptes(mm, addr, ptep, pte, 1); + ___set_ptes(mm, ptep, pte, 1, pgsize); return; } =20 - if (!pte_cont(pte)) { - __set_ptes(mm, addr, ptep, pte, 1); - return; - } - - pfn =3D pte_pfn(pte); - dpfn =3D pgsize >> PAGE_SHIFT; - hugeprot =3D pte_pgprot(pte); - /* Only need to "break" if transitioning valid -> valid. */ - if (pte_valid(__ptep_get(ptep))) + if (pte_cont(pte) && pte_valid(__ptep_get(ptep))) clear_flush(mm, addr, ptep, pgsize, ncontig); =20 - for (i =3D 0; i < ncontig; i++, ptep++, addr +=3D pgsize, pfn +=3D dpfn) - __set_ptes(mm, addr, ptep, pfn_pte(pfn, hugeprot), 1); + ___set_ptes(mm, ptep, pte, ncontig, pgsize); } =20 pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, @@ -441,11 +429,9 @@ int huge_ptep_set_access_flags(struct vm_area_struct *= vma, unsigned long addr, pte_t *ptep, pte_t pte, int dirty) { - int ncontig, i; + int ncontig; size_t pgsize =3D 0; - unsigned long pfn =3D pte_pfn(pte), dpfn; struct mm_struct *mm =3D vma->vm_mm; - pgprot_t hugeprot; pte_t orig_pte; =20 VM_WARN_ON(!pte_present(pte)); @@ -454,7 +440,6 @@ int huge_ptep_set_access_flags(struct vm_area_struct *v= ma, return __ptep_set_access_flags(vma, addr, ptep, pte, dirty); =20 ncontig =3D find_num_contig(mm, addr, ptep, &pgsize); - dpfn =3D pgsize >> PAGE_SHIFT; =20 if (!__cont_access_flags_changed(ptep, pte, ncontig)) return 0; @@ -469,19 +454,14 @@ int huge_ptep_set_access_flags(struct vm_area_struct = *vma, if (pte_young(orig_pte)) pte =3D pte_mkyoung(pte); =20 - hugeprot =3D pte_pgprot(pte); - for (i =3D 0; i < ncontig; i++, ptep++, addr +=3D pgsize, pfn +=3D dpfn) - __set_ptes(mm, addr, ptep, pfn_pte(pfn, hugeprot), 1); - + ___set_ptes(mm, ptep, pte, ncontig, pgsize); return 1; } =20 void huge_ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { - unsigned long pfn, dpfn; - pgprot_t hugeprot; - int ncontig, i; + int ncontig; size_t pgsize; pte_t pte; =20 @@ -494,16 +474,11 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm, } =20 ncontig =3D find_num_contig(mm, addr, ptep, &pgsize); - dpfn =3D pgsize >> PAGE_SHIFT; =20 pte =3D get_clear_contig_flush(mm, addr, ptep, pgsize, ncontig); pte =3D pte_wrprotect(pte); =20 - hugeprot =3D pte_pgprot(pte); - pfn =3D pte_pfn(pte); - - for (i =3D 0; i < ncontig; i++, ptep++, addr +=3D pgsize, pfn +=3D dpfn) - __set_ptes(mm, addr, ptep, pfn_pte(pfn, hugeprot), 1); + ___set_ptes(mm, ptep, pte, ncontig, pgsize); } =20 pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, @@ -517,10 +492,7 @@ pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, pte =3D __ptep_get(ptep); VM_WARN_ON(!pte_present(pte)); =20 - if (!pte_cont(pte)) - return ptep_clear_flush(vma, addr, ptep); - - ncontig =3D find_num_contig(mm, addr, ptep, &pgsize); + ncontig =3D num_contig_ptes(page_size(pte_page(pte)), &pgsize); return get_clear_contig_flush(mm, addr, ptep, pgsize, ncontig); } =20 --=20 2.43.0 From nobody Sun Dec 14 21:33:56 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 2D8851B532F for ; Wed, 5 Feb 2025 15:10:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738768251; cv=none; b=LmI9Zo6bSc43z/SZQYkrhI0ELadSkDTmey+yfK7jl9miQkBh7DZWUV2BlX6Q9w7KoaSwtYUgT3rdS6xZAdqDzcfXOuCkmCTBq8ysGmP/sUuHmdw2rB96jUynk77ydvHpPkHryAQqOwZNtbDucpI78661rqg0dT5j1V5DJX3zSxQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738768251; c=relaxed/simple; bh=Vl8FxaSXqon0BJeHQ6SiEPfjucP2xExTZXy5dxc2Euw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=tfvSDv93yaMA+fystggRCtufOA8nfQG1Im17vMR63uJPuR4wg7LPV72ouOSF/4BxIJpGWh82Eg/SCTWApwTmcGTiSR0ka8OI3l3kEheTfv1oid9echiY4NJ4ZzuoISpqNL686WsNlBXGQABkVdjkchYKlIhHLosqviSgna++Lec= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 5B6AD1063; Wed, 5 Feb 2025 07:11:13 -0800 (PST) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 3F23B3F5A1; Wed, 5 Feb 2025 07:10:47 -0800 (PST) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Muchun Song , Pasha Tatashin , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Mark Rutland , Ard Biesheuvel , Anshuman Khandual , Dev Jain , Alexandre Ghiti , Steve Capper , Kevin Brodsky Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v1 08/16] arm64/mm: Hoist barriers out of ___set_ptes() loop Date: Wed, 5 Feb 2025 15:09:48 +0000 Message-ID: <20250205151003.88959-9-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250205151003.88959-1-ryan.roberts@arm.com> References: <20250205151003.88959-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" ___set_ptes() previously called __set_pte() for each PTE in the range, which would conditionally issue a DSB and ISB to make the new PTE value immediately visible to the table walker if the new PTE was valid and for kernel space. We can do better than this; let's hoist those barriers out of the loop so that they are only issued once at the end of the loop. We then reduce the cost by the number of PTEs in the range. Signed-off-by: Ryan Roberts --- arch/arm64/include/asm/pgtable.h | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgta= ble.h index 3b55d9a15f05..1d428e9c0e5a 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -317,10 +317,8 @@ static inline void __set_pte_nosync(pte_t *ptep, pte_t= pte) WRITE_ONCE(*ptep, pte); } =20 -static inline void __set_pte(pte_t *ptep, pte_t pte) +static inline void __set_pte_complete(pte_t pte) { - __set_pte_nosync(ptep, pte); - /* * Only if the new pte is valid and kernel, otherwise TLB maintenance * or update_mmu_cache() have the necessary barriers. @@ -331,6 +329,12 @@ static inline void __set_pte(pte_t *ptep, pte_t pte) } } =20 +static inline void __set_pte(pte_t *ptep, pte_t pte) +{ + __set_pte_nosync(ptep, pte); + __set_pte_complete(pte); +} + static inline pte_t __ptep_get(pte_t *ptep) { return READ_ONCE(*ptep); @@ -647,12 +651,14 @@ static inline void ___set_ptes(struct mm_struct *mm, = pte_t *ptep, pte_t pte, =20 for (;;) { __check_safe_pte_update(mm, ptep, pte); - __set_pte(ptep, pte); + __set_pte_nosync(ptep, pte); if (--nr =3D=3D 0) break; ptep++; pte =3D pte_advance_pfn(pte, stride); } + + __set_pte_complete(pte); } =20 static inline void __set_ptes(struct mm_struct *mm, --=20 2.43.0 From nobody Sun Dec 14 21:33:56 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 32F2F1DAC95 for ; Wed, 5 Feb 2025 15:10:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738768254; cv=none; b=l5w1F8kWeyEIYCaH4VjJdScsyPfE+hu/Pa1bMPSOJaRMGj8o6ekXPlNgDXkNha0ElqKqnNSSA1OL/qzZwdlTXxxfHT7dixY4nji9LiD8dLbfXyrNIXLwyvscp3GQkh8dj0ufyoNxuWIckJv5cZ7+8NsxKxE4yOzsBVsKWE/5p24= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738768254; c=relaxed/simple; bh=we+4uLqMLRcQvkQL3DqN30UXhLctct4D1lj8cXVEU2U=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=nr3LolhzCNK9T/TBriULhApnP9hsWPtzxThPnhdU97Dp2lfjuBwDQ4uM2EK8EjmyDfP8Gp9ZYtakg92F+vzUq3xwLLmYRJ9NPwczE6vc8yTxWaed7U9jssNGPRUfmads/Q3PAFMBHuugnEiXDxumxhiMmnbae9GXotPwNZtWzZY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 2A0151063; Wed, 5 Feb 2025 07:11:16 -0800 (PST) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 0DD7A3F5A1; Wed, 5 Feb 2025 07:10:49 -0800 (PST) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Muchun Song , Pasha Tatashin , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Mark Rutland , Ard Biesheuvel , Anshuman Khandual , Dev Jain , Alexandre Ghiti , Steve Capper , Kevin Brodsky Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v1 09/16] arm64/mm: Avoid barriers for invalid or userspace mappings Date: Wed, 5 Feb 2025 15:09:49 +0000 Message-ID: <20250205151003.88959-10-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250205151003.88959-1-ryan.roberts@arm.com> References: <20250205151003.88959-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" __set_pte_complete(), set_pmd(), set_pud(), set_p4d() and set_pgd() are used to write entries into pgtables. And they issue barriers (currently dsb and isb) to ensure that the written values are observed by the table walker prior to any program-order-future memory access to the mapped location. Over the years some of these functions have received optimizations: In particular, commit 7f0b1bf04511 ("arm64: Fix barriers used for page table modifications") made it so that the barriers were only emitted for valid-kernel mappings for set_pte() (now __set_pte_complete()). And commit 0795edaf3f1f ("arm64: pgtable: Implement p[mu]d_valid() and check in set_p[mu]d()") made it so that set_pmd()/set_pud() only emitted the barriers for valid mappings. set_p4d()/set_pgd() continue to emit the barriers unconditionally. This is all very confusing to the casual observer; surely the rules should be invariant to the level? Let's change this so that every level consistently emits the barriers only when setting valid, non-user entries (both table and leaf). It seems obvious that if it is ok to elide barriers all but valid kernel mappings at pte level, it must also be ok to do this for leaf entries at other levels: If setting an entry to invalid, a tlb maintenance operaiton must surely follow to synchronise the TLB and this contains the required barriers. If setting a valid user mapping, the previous mapping must have been invalid and there must have been a TLB maintenance operation (complete with barriers) to honour break-before-make. So the worst that can happen is we take an extra fault (which will imply the DSB + ISB) and conclude that there is nothing to do. These are the aguments for doing this optimization at pte level and they also apply to leaf mappings at other levels. For table entries, the same arguments hold: If unsetting a table entry, a TLB is required and this will emit the required barriers. If setting a table entry, the previous value must have been invalid and the table walker must already be able to observe that. Additionally the contents of the pgtable being pointed to in the newly set entry must be visible before the entry is written and this is enforced via smp_wmb() (dmb) in the pgtable allocation functions and in __split_huge_pmd_locked(). But this last part could never have been enforced by the barriers in set_pXd() because they occur after updating the entry. So ultimately, the wost that can happen by eliding these barriers for user table entries is an extra fault. I observe roughly the same number of page faults (107M) with and without this change when compiling the kernel on Apple M2. Signed-off-by: Ryan Roberts --- arch/arm64/include/asm/pgtable.h | 60 ++++++++++++++++++++++++++++---- 1 file changed, 54 insertions(+), 6 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgta= ble.h index 1d428e9c0e5a..ff358d983583 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -767,6 +767,19 @@ static inline bool in_swapper_pgdir(void *addr) ((unsigned long)swapper_pg_dir & PAGE_MASK); } =20 +static inline bool pmd_valid_not_user(pmd_t pmd) +{ + /* + * User-space table pmd entries always have (PXN && !UXN). All other + * combinations indicate it's a table entry for kernel space. + * Valid-not-user leaf entries follow the same rules as + * pte_valid_not_user(). + */ + if (pmd_table(pmd)) + return !((pmd_val(pmd) & (PMD_TABLE_PXN | PMD_TABLE_UXN)) =3D=3D PMD_TAB= LE_PXN); + return pte_valid_not_user(pmd_pte(pmd)); +} + static inline void set_pmd(pmd_t *pmdp, pmd_t pmd) { #ifdef __PAGETABLE_PMD_FOLDED @@ -778,7 +791,7 @@ static inline void set_pmd(pmd_t *pmdp, pmd_t pmd) =20 WRITE_ONCE(*pmdp, pmd); =20 - if (pmd_valid(pmd)) { + if (pmd_valid_not_user(pmd)) { dsb(ishst); isb(); } @@ -836,6 +849,17 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd) =20 static inline bool pgtable_l4_enabled(void); =20 + +static inline bool pud_valid_not_user(pud_t pud) +{ + /* + * Follows the same rules as pmd_valid_not_user(). + */ + if (pud_table(pud)) + return !((pud_val(pud) & (PUD_TABLE_PXN | PUD_TABLE_UXN)) =3D=3D PUD_TAB= LE_PXN); + return pte_valid_not_user(pud_pte(pud)); +} + static inline void set_pud(pud_t *pudp, pud_t pud) { if (!pgtable_l4_enabled() && in_swapper_pgdir(pudp)) { @@ -845,7 +869,7 @@ static inline void set_pud(pud_t *pudp, pud_t pud) =20 WRITE_ONCE(*pudp, pud); =20 - if (pud_valid(pud)) { + if (pud_valid_not_user(pud)) { dsb(ishst); isb(); } @@ -917,6 +941,16 @@ static inline bool mm_pud_folded(const struct mm_struc= t *mm) #define p4d_bad(p4d) (pgtable_l4_enabled() && !(p4d_val(p4d) & P4D_TABLE_= BIT)) #define p4d_present(p4d) (!p4d_none(p4d)) =20 +static inline bool p4d_valid_not_user(p4d_t p4d) +{ + /* + * User-space table p4d entries always have (PXN && !UXN). All other + * combinations indicate it's a table entry for kernel space. p4d block + * entries are not supported. + */ + return !((p4d_val(p4d) & (P4D_TABLE_PXN | P4D_TABLE_UXN)) =3D=3D P4D_TABL= E_PXN); +} + static inline void set_p4d(p4d_t *p4dp, p4d_t p4d) { if (in_swapper_pgdir(p4dp)) { @@ -925,8 +959,11 @@ static inline void set_p4d(p4d_t *p4dp, p4d_t p4d) } =20 WRITE_ONCE(*p4dp, p4d); - dsb(ishst); - isb(); + + if (p4d_valid_not_user(p4d)) { + dsb(ishst); + isb(); + } } =20 static inline void p4d_clear(p4d_t *p4dp) @@ -1044,6 +1081,14 @@ static inline bool mm_p4d_folded(const struct mm_str= uct *mm) #define pgd_bad(pgd) (pgtable_l5_enabled() && !(pgd_val(pgd) & PGD_TABLE_= BIT)) #define pgd_present(pgd) (!pgd_none(pgd)) =20 +static inline bool pgd_valid_not_user(pgd_t pgd) +{ + /* + * Follows the same rules as p4d_valid_not_user(). + */ + return !((pgd_val(pgd) & (PGD_TABLE_PXN | PGD_TABLE_UXN)) =3D=3D PGD_TABL= E_PXN); +} + static inline void set_pgd(pgd_t *pgdp, pgd_t pgd) { if (in_swapper_pgdir(pgdp)) { @@ -1052,8 +1097,11 @@ static inline void set_pgd(pgd_t *pgdp, pgd_t pgd) } =20 WRITE_ONCE(*pgdp, pgd); - dsb(ishst); - isb(); + + if (pgd_valid_not_user(pgd)) { + dsb(ishst); + isb(); + } } =20 static inline void pgd_clear(pgd_t *pgdp) --=20 2.43.0 From nobody Sun Dec 14 21:33:56 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id B776E1DB34C for ; Wed, 5 Feb 2025 15:10:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738768257; cv=none; b=NB6ckWHav8a9YilkWf4Z4O3rZQxU1oQJ7oXRXeZiviemKHCKe59kZQINi+Cjo5LB2m3wjST7lsIJczFYeMGxJdbWw/C+UN09xh6J3Ftg0LqfuzLDqZCtwzFVU7iOqB4H+gceU58K1xPDFWSgvc0HPqpd5KSXUL1Iy9wfxoP8ifE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738768257; c=relaxed/simple; bh=3+ri9qq2fCmQAM0EaC9iYqAON/JEG066r4jXDBoLZLc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=REJnrDrbbDl/E1JULtU0l9gclo4na4BV+Fl8l/g9oMdVKCoNX9bku7dxbqO+WSDzwKorkKT5kgDGoaQch2Dw/pKyx33JotMGGngcCJoKz9F2TjmIWrYKNq7/ZUwq4lA9EnTcnZTQDXLByLt+2PMGP/6ZHbagN8VfvK9OhkJ78dw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id EC4DA1063; Wed, 5 Feb 2025 07:11:18 -0800 (PST) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id D00F53F5A1; Wed, 5 Feb 2025 07:10:52 -0800 (PST) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Muchun Song , Pasha Tatashin , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Mark Rutland , Ard Biesheuvel , Anshuman Khandual , Dev Jain , Alexandre Ghiti , Steve Capper , Kevin Brodsky Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v1 10/16] mm/vmalloc: Warn on improper use of vunmap_range() Date: Wed, 5 Feb 2025 15:09:50 +0000 Message-ID: <20250205151003.88959-11-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250205151003.88959-1-ryan.roberts@arm.com> References: <20250205151003.88959-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" A call to vmalloc_huge() may cause memory blocks to be mapped at pmd or pud level. But it is possible to subsquently call vunmap_range() on a sub-range of the mapped memory, which partially overlaps a pmd or pud. In this case, vmalloc unmaps the entire pmd or pud so that the no-overlapping portion is also unmapped. Clearly that would have a bad outcome, but it's not something that any callers do today as far as I can tell. So I guess it's jsut expected that callers will not do this. However, it would be useful to know if this happened in future; let's add a warning to cover the eventuality. Signed-off-by: Ryan Roberts --- mm/vmalloc.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index a6e7acebe9ad..fcdf67d5177a 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -374,8 +374,10 @@ static void vunmap_pmd_range(pud_t *pud, unsigned long= addr, unsigned long end, if (cleared || pmd_bad(*pmd)) *mask |=3D PGTBL_PMD_MODIFIED; =20 - if (cleared) + if (cleared) { + WARN_ON(next - addr < PMD_SIZE); continue; + } if (pmd_none_or_clear_bad(pmd)) continue; vunmap_pte_range(pmd, addr, next, mask); @@ -399,8 +401,10 @@ static void vunmap_pud_range(p4d_t *p4d, unsigned long= addr, unsigned long end, if (cleared || pud_bad(*pud)) *mask |=3D PGTBL_PUD_MODIFIED; =20 - if (cleared) + if (cleared) { + WARN_ON(next - addr < PUD_SIZE); continue; + } if (pud_none_or_clear_bad(pud)) continue; vunmap_pmd_range(pud, addr, next, mask); --=20 2.43.0 From nobody Sun Dec 14 21:33:56 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id ABDB01DBB13 for ; Wed, 5 Feb 2025 15:10:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738768260; cv=none; b=H3N/kHaOTm7qbzWT9dhfg6jobqs7EcJY6lRvkYo4LAe0j6caT9ppW/L164bfMnC/bT3WL2bG0xHZz6RXE3c2WCG1SxDPwcnwa1Z6EWEXACxZvNY0UN5W88fJ1MbX98SqLjQWTfI6Z3GhM4k3+QiaMF8mdse94sfI02GrOjdZInw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738768260; c=relaxed/simple; bh=QXe8dUCDp+BKR+2eya14NSQxvp87N/vZ+M9E5qojvFo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JyBY1ZTq68U18XyQ9mCmVyIROpEiA1MKhfwr3WGZ2Cc2KUX9zbECyx79/rUKp80I9YKvmJCQ1klto9t8b2VLVzHncy/amsUa1dhHYeugJCHY5GRz8SKCj/KlXp+3IYXqPqTVVg+3+u3ql7d4V2mLb6tC3DF0GqvpOm4SE1suO+o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D45141063; Wed, 5 Feb 2025 07:11:21 -0800 (PST) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 9E5CF3F5A1; Wed, 5 Feb 2025 07:10:55 -0800 (PST) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Muchun Song , Pasha Tatashin , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Mark Rutland , Ard Biesheuvel , Anshuman Khandual , Dev Jain , Alexandre Ghiti , Steve Capper , Kevin Brodsky Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v1 11/16] mm/vmalloc: Gracefully unmap huge ptes Date: Wed, 5 Feb 2025 15:09:51 +0000 Message-ID: <20250205151003.88959-12-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250205151003.88959-1-ryan.roberts@arm.com> References: <20250205151003.88959-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Commit f7ee1f13d606 ("mm/vmalloc: enable mapping of huge pages at pte level in vmap") added its support by reusing the set_huge_pte_at() API, which is otherwise only used for user mappings. But when unmapping those huge ptes, it continued to call ptep_get_and_clear(), which is a layering violation. To date, the only arch to implement this support is powerpc and it all happens to work ok for it. But arm64's implementation of ptep_get_and_clear() can not be safely used to clear a previous set_huge_pte_at(). So let's introduce a new arch opt-in function, arch_vmap_pte_range_unmap_size(), which can provide the size of a (present) pte. Then we can call huge_ptep_get_and_clear() to tear it down properly. Note that if vunmap_range() is called with a range that starts in the middle of a huge pte-mapped page, we must unmap the entire huge page so the behaviour is consistent with pmd and pud block mappings. In this case emit a warning just like we do for pmd/pud mappings. Signed-off-by: Ryan Roberts Reviewed-by: Anshuman Khandual --- include/linux/vmalloc.h | 8 ++++++++ mm/vmalloc.c | 18 ++++++++++++++++-- 2 files changed, 24 insertions(+), 2 deletions(-) diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 31e9ffd936e3..16dd4cba64f2 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -113,6 +113,14 @@ static inline unsigned long arch_vmap_pte_range_map_si= ze(unsigned long addr, uns } #endif =20 +#ifndef arch_vmap_pte_range_unmap_size +static inline unsigned long arch_vmap_pte_range_unmap_size(unsigned long a= ddr, + pte_t *ptep) +{ + return PAGE_SIZE; +} +#endif + #ifndef arch_vmap_pte_supported_shift static inline int arch_vmap_pte_supported_shift(unsigned long size) { diff --git a/mm/vmalloc.c b/mm/vmalloc.c index fcdf67d5177a..6111ce900ec4 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -350,12 +350,26 @@ static void vunmap_pte_range(pmd_t *pmd, unsigned lon= g addr, unsigned long end, pgtbl_mod_mask *mask) { pte_t *pte; + pte_t ptent; + unsigned long size =3D PAGE_SIZE; =20 pte =3D pte_offset_kernel(pmd, addr); do { - pte_t ptent =3D ptep_get_and_clear(&init_mm, addr, pte); +#ifdef CONFIG_HUGETLB_PAGE + size =3D arch_vmap_pte_range_unmap_size(addr, pte); + if (size !=3D PAGE_SIZE) { + if (WARN_ON(!IS_ALIGNED(addr, size))) { + addr =3D ALIGN_DOWN(addr, size); + pte =3D PTR_ALIGN_DOWN(pte, sizeof(*pte) * (size >> PAGE_SHIFT)); + } + ptent =3D huge_ptep_get_and_clear(&init_mm, addr, pte, size); + if (WARN_ON(end - addr < size)) + size =3D end - addr; + } else +#endif + ptent =3D ptep_get_and_clear(&init_mm, addr, pte); WARN_ON(!pte_none(ptent) && !pte_present(ptent)); - } while (pte++, addr +=3D PAGE_SIZE, addr !=3D end); + } while (pte +=3D (size >> PAGE_SHIFT), addr +=3D size, addr !=3D end); *mask |=3D PGTBL_PTE_MODIFIED; } =20 --=20 2.43.0 From nobody Sun Dec 14 21:33:56 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 6214D1DDC16 for ; Wed, 5 Feb 2025 15:11:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738768263; cv=none; b=QKEtsmL5utzA+4hhCQihcYAlhbitBgpB50jxse4gQNw5JeO+y9SJXQ6Drwzfmcu0J9f18zUgENPWRWQgo/9wRUmNuTkVee+e1Ro/qPUG/k5VB4K5Mwrcu4w/MtA7QJYgq3VyoVbpVySAoLFI9IEawTSXBYWLriuh8TN4C9d0sIg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738768263; c=relaxed/simple; bh=4HaEif7VAgw7qFBNCqe75y3ls+QYvgNm1pLd1XNDZsQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=E8eI6j5rnQt7v3LAmgafxxeJjaYAgd2KCRk0dCzo6JHaVIji+diamHCgowuBfCcWXnZ7OIDI3hAU4uspxqW9sLu2rOeMe5+XY5eVKYppNOC/QaW3W700O2xO2Q08gS40t++XGnZzoOdTr0GdkrTDJ1RZRmayYvmNF1TR4YtL2i8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 88D461C01; Wed, 5 Feb 2025 07:11:24 -0800 (PST) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 6CE293F5A1; Wed, 5 Feb 2025 07:10:58 -0800 (PST) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Muchun Song , Pasha Tatashin , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Mark Rutland , Ard Biesheuvel , Anshuman Khandual , Dev Jain , Alexandre Ghiti , Steve Capper , Kevin Brodsky Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v1 12/16] arm64/mm: Support huge pte-mapped pages in vmap Date: Wed, 5 Feb 2025 15:09:52 +0000 Message-ID: <20250205151003.88959-13-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250205151003.88959-1-ryan.roberts@arm.com> References: <20250205151003.88959-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Implement the required arch functions to enable use of contpte in the vmap when VM_ALLOW_HUGE_VMAP is specified. This speeds up vmap operations due to only having to issue a DSB and ISB per contpte block instead of per pte. But it also means that the TLB pressure reduces due to only needing a single TLB entry for the whole contpte block. Since vmap uses set_huge_pte_at() to set the contpte, that API is now used for kernel mappings for the first time. Although in the vmap case we never expect it to be called to modify a valid mapping so clear_flush() should never be called, it's still wise to make it robust for the kernel case, so amend the tlb flush function if the mm is for kernel space. Tested with vmalloc performance selftests: # kself/mm/test_vmalloc.sh \ run_test_mask=3D1 test_repeat_count=3D5 nr_pages=3D256 test_loop_count=3D100000 use_huge=3D1 Duration reduced from 1274243 usec to 1083553 usec on Apple M2 for 15% reduction in time taken. Signed-off-by: Ryan Roberts Reviewed-by: Anshuman Khandual --- arch/arm64/include/asm/vmalloc.h | 40 ++++++++++++++++++++++++++++++++ arch/arm64/mm/hugetlbpage.c | 5 +++- 2 files changed, 44 insertions(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/vmalloc.h b/arch/arm64/include/asm/vmal= loc.h index 38fafffe699f..fbdeb40f3857 100644 --- a/arch/arm64/include/asm/vmalloc.h +++ b/arch/arm64/include/asm/vmalloc.h @@ -23,6 +23,46 @@ static inline bool arch_vmap_pmd_supported(pgprot_t prot) return !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS); } =20 +#define arch_vmap_pte_range_map_size arch_vmap_pte_range_map_size +static inline unsigned long arch_vmap_pte_range_map_size(unsigned long add= r, + unsigned long end, u64 pfn, + unsigned int max_page_shift) +{ + if (max_page_shift < CONT_PTE_SHIFT) + return PAGE_SIZE; + + if (end - addr < CONT_PTE_SIZE) + return PAGE_SIZE; + + if (!IS_ALIGNED(addr, CONT_PTE_SIZE)) + return PAGE_SIZE; + + if (!IS_ALIGNED(PFN_PHYS(pfn), CONT_PTE_SIZE)) + return PAGE_SIZE; + + return CONT_PTE_SIZE; +} + +#define arch_vmap_pte_range_unmap_size arch_vmap_pte_range_unmap_size +static inline unsigned long arch_vmap_pte_range_unmap_size(unsigned long a= ddr, + pte_t *ptep) +{ + /* + * The caller handles alignment so it's sufficient just to check + * PTE_CONT. + */ + return pte_valid_cont(__ptep_get(ptep)) ? CONT_PTE_SIZE : PAGE_SIZE; +} + +#define arch_vmap_pte_supported_shift arch_vmap_pte_supported_shift +static inline int arch_vmap_pte_supported_shift(unsigned long size) +{ + if (size >=3D CONT_PTE_SIZE) + return CONT_PTE_SHIFT; + + return PAGE_SHIFT; +} + #endif =20 #define arch_vmap_pgprot_tagged arch_vmap_pgprot_tagged diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 02afee31444e..a74e43101dad 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -217,7 +217,10 @@ static void clear_flush(struct mm_struct *mm, for (i =3D 0; i < ncontig; i++, addr +=3D pgsize, ptep++) ___ptep_get_and_clear(mm, ptep, pgsize); =20 - __flush_hugetlb_tlb_range(&vma, saddr, addr, pgsize, true); + if (mm =3D=3D &init_mm) + flush_tlb_kernel_range(saddr, addr); + else + __flush_hugetlb_tlb_range(&vma, saddr, addr, pgsize, true); } =20 void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, --=20 2.43.0 From nobody Sun Dec 14 21:33:56 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 412341DF98F; Wed, 5 Feb 2025 15:11:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738768265; cv=none; b=bWTER2lDicq0pgpiJw6ACPP/GwbT+An95Wphl0QfKCpoLAAHDL76pV1YdtQcoHK2vrXR/yEtwfZyLOSATGx53XXbiAG8s624G0/YeVeJEhzv06XFr0uZwwVDUC7AuebYg+SKHXw0RRx5ouEXCpzWUG2ka5n/2tcEMbnCUT8i0LE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738768265; c=relaxed/simple; bh=ma+w0E6ocwSGzO+YhsXjcwBW6CsGmyczv/5KrNgRbAY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=K4jyLGFXFf7+LoNXUbG/Wpcn8I091eDusCBEHIETnzzQ4wvVD+iuxZXz8Xli9XqCmbx78IY8mboRlMGWNjGlS/axDxuTjE+3gu48xller0M1Od0b/seFyLaFp67tGzcAXyhTDIYUjnMzbC5QuNG8vbBTwQi2lB7DaSttUuKwfVA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 7589B1063; Wed, 5 Feb 2025 07:11:27 -0800 (PST) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 3AC763F5A1; Wed, 5 Feb 2025 07:11:01 -0800 (PST) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Muchun Song , Pasha Tatashin , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Mark Rutland , Ard Biesheuvel , Anshuman Khandual , Dev Jain , Alexandre Ghiti , Steve Capper , Kevin Brodsky Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org Subject: [PATCH v1 13/16] mm: Don't skip arch_sync_kernel_mappings() in error paths Date: Wed, 5 Feb 2025 15:09:53 +0000 Message-ID: <20250205151003.88959-14-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250205151003.88959-1-ryan.roberts@arm.com> References: <20250205151003.88959-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Fix callers that previously skipped calling arch_sync_kernel_mappings() if an error occurred during a pgtable update. The call is still required to sync any pgtable updates that may have occurred prior to hitting the error condition. These are theoretical bugs discovered during code review. Cc: Fixes: 2ba3e6947aed ("mm/vmalloc: track which page-table levels were modifi= ed") Fixes: 0c95cba49255 ("mm: apply_to_pte_range warn and fail if a large pte i= s encountered") Signed-off-by: Ryan Roberts Reviewed-by: Anshuman Khandual --- mm/memory.c | 6 ++++-- mm/vmalloc.c | 4 ++-- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 539c0f7c6d54..a15f7dd500ea 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3040,8 +3040,10 @@ static int __apply_to_page_range(struct mm_struct *m= m, unsigned long addr, next =3D pgd_addr_end(addr, end); if (pgd_none(*pgd) && !create) continue; - if (WARN_ON_ONCE(pgd_leaf(*pgd))) - return -EINVAL; + if (WARN_ON_ONCE(pgd_leaf(*pgd))) { + err =3D -EINVAL; + break; + } if (!pgd_none(*pgd) && WARN_ON_ONCE(pgd_bad(*pgd))) { if (!create) continue; diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 6111ce900ec4..68950b1824d0 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -604,13 +604,13 @@ static int vmap_small_pages_range_noflush(unsigned lo= ng addr, unsigned long end, mask |=3D PGTBL_PGD_MODIFIED; err =3D vmap_pages_p4d_range(pgd, addr, next, prot, pages, &nr, &mask); if (err) - return err; + break; } while (pgd++, addr =3D next, addr !=3D end); =20 if (mask & ARCH_PAGE_TABLE_SYNC_MASK) arch_sync_kernel_mappings(start, end); =20 - return 0; + return err; } =20 /* --=20 2.43.0 From nobody Sun Dec 14 21:33:56 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 49F921FC0EE for ; Wed, 5 Feb 2025 15:11:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738768268; cv=none; b=QCWxAMnwvgBnHLUAYBa6rmW0xrcctXE5xkpoSLQcdj1lN8dltxv3kZZgqg2+VFEH31y5eS6Taaqi9se2j3iSMu9O2qmfB5nu+B2ID+Gdv/0BHoObg/W/TlcxyBt7RRAvT23PF1gA3HZeGOM9g0Er7KDwWXkZXlNL4u/YvFVM1lQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738768268; c=relaxed/simple; bh=BT+wtu3Tp2vzbfyLzNuD8q+TT+hTRIjvLDx/eVBe0T8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KndxCvJh75qhHGa61sSkB1PvdTL2/mxWZ3V38U+5FMMxIaZaoUikjrAdB80+skMYfLEG9Qw2EurqOcXhZ1q/PiVnT1Hmu3GlDILQ/7wV9j6tYnQmMDHk3nJaZZXWpOy03yj8/imw4qNlYQuojTRjJqBgU7Jy5yNnmA8a6sMXUuk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 439D7165C; Wed, 5 Feb 2025 07:11:30 -0800 (PST) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 27B823F5A1; Wed, 5 Feb 2025 07:11:04 -0800 (PST) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Muchun Song , Pasha Tatashin , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Mark Rutland , Ard Biesheuvel , Anshuman Khandual , Dev Jain , Alexandre Ghiti , Steve Capper , Kevin Brodsky Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v1 14/16] mm/vmalloc: Batch arch_sync_kernel_mappings() more efficiently Date: Wed, 5 Feb 2025 15:09:54 +0000 Message-ID: <20250205151003.88959-15-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250205151003.88959-1-ryan.roberts@arm.com> References: <20250205151003.88959-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When page_shift is greater than PAGE_SIZE, __vmap_pages_range_noflush() will call vmap_range_noflush() for each individual huge page. But vmap_range_noflush() would previously call arch_sync_kernel_mappings() directly so this would end up being called for every huge page. We can do better than this; refactor the call into the outer __vmap_pages_range_noflush() so that it is only called once for the entire batch operation. This will benefit performance for arm64 which is about to opt-in to using the hook. Signed-off-by: Ryan Roberts Reviewed-by: Anshuman Khandual --- mm/vmalloc.c | 60 ++++++++++++++++++++++++++-------------------------- 1 file changed, 30 insertions(+), 30 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 68950b1824d0..50fd44439875 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -285,40 +285,38 @@ static int vmap_p4d_range(pgd_t *pgd, unsigned long a= ddr, unsigned long end, =20 static int vmap_range_noflush(unsigned long addr, unsigned long end, phys_addr_t phys_addr, pgprot_t prot, - unsigned int max_page_shift) + unsigned int max_page_shift, pgtbl_mod_mask *mask) { pgd_t *pgd; - unsigned long start; unsigned long next; int err; - pgtbl_mod_mask mask =3D 0; =20 might_sleep(); BUG_ON(addr >=3D end); =20 - start =3D addr; pgd =3D pgd_offset_k(addr); do { next =3D pgd_addr_end(addr, end); err =3D vmap_p4d_range(pgd, addr, next, phys_addr, prot, - max_page_shift, &mask); + max_page_shift, mask); if (err) break; } while (pgd++, phys_addr +=3D (next - addr), addr =3D next, addr !=3D en= d); =20 - if (mask & ARCH_PAGE_TABLE_SYNC_MASK) - arch_sync_kernel_mappings(start, end); - return err; } =20 int vmap_page_range(unsigned long addr, unsigned long end, phys_addr_t phys_addr, pgprot_t prot) { + pgtbl_mod_mask mask =3D 0; int err; =20 err =3D vmap_range_noflush(addr, end, phys_addr, pgprot_nx(prot), - ioremap_max_page_shift); + ioremap_max_page_shift, &mask); + if (mask & ARCH_PAGE_TABLE_SYNC_MASK) + arch_sync_kernel_mappings(addr, end); + flush_cache_vmap(addr, end); if (!err) err =3D kmsan_ioremap_page_range(addr, end, phys_addr, prot, @@ -587,29 +585,24 @@ static int vmap_pages_p4d_range(pgd_t *pgd, unsigned = long addr, } =20 static int vmap_small_pages_range_noflush(unsigned long addr, unsigned lon= g end, - pgprot_t prot, struct page **pages) + pgprot_t prot, struct page **pages, pgtbl_mod_mask *mask) { - unsigned long start =3D addr; pgd_t *pgd; unsigned long next; int err =3D 0; int nr =3D 0; - pgtbl_mod_mask mask =3D 0; =20 BUG_ON(addr >=3D end); pgd =3D pgd_offset_k(addr); do { next =3D pgd_addr_end(addr, end); if (pgd_bad(*pgd)) - mask |=3D PGTBL_PGD_MODIFIED; - err =3D vmap_pages_p4d_range(pgd, addr, next, prot, pages, &nr, &mask); + *mask |=3D PGTBL_PGD_MODIFIED; + err =3D vmap_pages_p4d_range(pgd, addr, next, prot, pages, &nr, mask); if (err) break; } while (pgd++, addr =3D next, addr !=3D end); =20 - if (mask & ARCH_PAGE_TABLE_SYNC_MASK) - arch_sync_kernel_mappings(start, end); - return err; } =20 @@ -626,26 +619,33 @@ int __vmap_pages_range_noflush(unsigned long addr, un= signed long end, pgprot_t prot, struct page **pages, unsigned int page_shift) { unsigned int i, nr =3D (end - addr) >> PAGE_SHIFT; + unsigned long start =3D addr; + pgtbl_mod_mask mask =3D 0; + int err =3D 0; =20 WARN_ON(page_shift < PAGE_SHIFT); =20 if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC) || - page_shift =3D=3D PAGE_SHIFT) - return vmap_small_pages_range_noflush(addr, end, prot, pages); - - for (i =3D 0; i < nr; i +=3D 1U << (page_shift - PAGE_SHIFT)) { - int err; - - err =3D vmap_range_noflush(addr, addr + (1UL << page_shift), - page_to_phys(pages[i]), prot, - page_shift); - if (err) - return err; + page_shift =3D=3D PAGE_SHIFT) { + err =3D vmap_small_pages_range_noflush(addr, end, prot, pages, + &mask); + } else { + for (i =3D 0; i < nr; i +=3D 1U << (page_shift - PAGE_SHIFT)) { + err =3D vmap_range_noflush(addr, + addr + (1UL << page_shift), + page_to_phys(pages[i]), prot, + page_shift, &mask); + if (err) + break; =20 - addr +=3D 1UL << page_shift; + addr +=3D 1UL << page_shift; + } } =20 - return 0; + if (mask & ARCH_PAGE_TABLE_SYNC_MASK) + arch_sync_kernel_mappings(start, end); + + return err; } =20 int vmap_pages_range_noflush(unsigned long addr, unsigned long end, --=20 2.43.0 From nobody Sun Dec 14 21:33:56 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id D0EFD1FC100 for ; Wed, 5 Feb 2025 15:11:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738768271; cv=none; b=uSiqlsP2vHQOqHs1xUAGRL1l/PTeUbzSDvw3dsPXzy1qq0XnFUqWZz/ua98517va7nMBXctL6XYGP6EAt2DkEAFzumxFdEkHsa3qdLfxi2o+ft70z5dkkOlcn058ZLJLAV7wxiemacBPWkd5SlhG5NLRL8xNGEH4LPLu3XHnHZY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738768271; c=relaxed/simple; bh=s1eTDNUU5OiX4B1NHUPhtxhYTlSG8CyE+yB06tL5EVo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=PgqammmLu65qo89PDZsF00NGeQVtQFQAE8znh7CCGvjSEGi/V5w9qTmdOfDNv0TqSmsY0cz+OGNsuEBgh7dG39//F2g6ZWTbDK6ilvPEmzahOXXcJFI4ORCYkarw4Klqshw2C5bJENPB3tgeuY1VGz8oMaWOUF8jDKWnay7QB04= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 12D691515; Wed, 5 Feb 2025 07:11:33 -0800 (PST) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id EA6533F5A1; Wed, 5 Feb 2025 07:11:06 -0800 (PST) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Muchun Song , Pasha Tatashin , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Mark Rutland , Ard Biesheuvel , Anshuman Khandual , Dev Jain , Alexandre Ghiti , Steve Capper , Kevin Brodsky Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v1 15/16] mm: Generalize arch_sync_kernel_mappings() Date: Wed, 5 Feb 2025 15:09:55 +0000 Message-ID: <20250205151003.88959-16-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250205151003.88959-1-ryan.roberts@arm.com> References: <20250205151003.88959-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" arch_sync_kernel_mappings() is an optional hook for arches to allow them to synchonize certain levels of the kernel pgtables after modification. But arm64 could benefit from a hook similar to this, paired with a call prior to starting the batch of modifications. So let's introduce arch_update_kernel_mappings_begin() and arch_update_kernel_mappings_end(). Both have a default implementation which can be overridden by the arch code. The default for the former is a nop, and the default for the latter is to call arch_sync_kernel_mappings(), so the latter replaces previous arch_sync_kernel_mappings() callsites. So by default, the resulting behaviour is unchanged. To avoid include hell, the pgtbl_mod_mask type and it's associated macros are moved to their own header. In a future patch, arm64 will opt-in to overriding both functions. Signed-off-by: Ryan Roberts --- include/linux/pgtable.h | 24 +---------------- include/linux/pgtable_modmask.h | 32 ++++++++++++++++++++++ include/linux/vmalloc.h | 47 +++++++++++++++++++++++++++++++++ mm/memory.c | 5 ++-- mm/vmalloc.c | 15 ++++++----- 5 files changed, 92 insertions(+), 31 deletions(-) create mode 100644 include/linux/pgtable_modmask.h diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 94d267d02372..7f70786a73b3 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -4,6 +4,7 @@ =20 #include #include +#include =20 #define PMD_ORDER (PMD_SHIFT - PAGE_SHIFT) #define PUD_ORDER (PUD_SHIFT - PAGE_SHIFT) @@ -1786,29 +1787,6 @@ static inline bool arch_has_pfn_modify_check(void) # define PAGE_KERNEL_EXEC PAGE_KERNEL #endif =20 -/* - * Page Table Modification bits for pgtbl_mod_mask. - * - * These are used by the p?d_alloc_track*() set of functions an in the gen= eric - * vmalloc/ioremap code to track at which page-table levels entries have b= een - * modified. Based on that the code can better decide when vmalloc and ior= emap - * mapping changes need to be synchronized to other page-tables in the sys= tem. - */ -#define __PGTBL_PGD_MODIFIED 0 -#define __PGTBL_P4D_MODIFIED 1 -#define __PGTBL_PUD_MODIFIED 2 -#define __PGTBL_PMD_MODIFIED 3 -#define __PGTBL_PTE_MODIFIED 4 - -#define PGTBL_PGD_MODIFIED BIT(__PGTBL_PGD_MODIFIED) -#define PGTBL_P4D_MODIFIED BIT(__PGTBL_P4D_MODIFIED) -#define PGTBL_PUD_MODIFIED BIT(__PGTBL_PUD_MODIFIED) -#define PGTBL_PMD_MODIFIED BIT(__PGTBL_PMD_MODIFIED) -#define PGTBL_PTE_MODIFIED BIT(__PGTBL_PTE_MODIFIED) - -/* Page-Table Modification Mask */ -typedef unsigned int pgtbl_mod_mask; - #endif /* !__ASSEMBLY__ */ =20 #if !defined(MAX_POSSIBLE_PHYSMEM_BITS) && !defined(CONFIG_64BIT) diff --git a/include/linux/pgtable_modmask.h b/include/linux/pgtable_modmas= k.h new file mode 100644 index 000000000000..5a21b1bb8df3 --- /dev/null +++ b/include/linux/pgtable_modmask.h @@ -0,0 +1,32 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _LINUX_PGTABLE_MODMASK_H +#define _LINUX_PGTABLE_MODMASK_H + +#ifndef __ASSEMBLY__ + +/* + * Page Table Modification bits for pgtbl_mod_mask. + * + * These are used by the p?d_alloc_track*() set of functions an in the gen= eric + * vmalloc/ioremap code to track at which page-table levels entries have b= een + * modified. Based on that the code can better decide when vmalloc and ior= emap + * mapping changes need to be synchronized to other page-tables in the sys= tem. + */ +#define __PGTBL_PGD_MODIFIED 0 +#define __PGTBL_P4D_MODIFIED 1 +#define __PGTBL_PUD_MODIFIED 2 +#define __PGTBL_PMD_MODIFIED 3 +#define __PGTBL_PTE_MODIFIED 4 + +#define PGTBL_PGD_MODIFIED BIT(__PGTBL_PGD_MODIFIED) +#define PGTBL_P4D_MODIFIED BIT(__PGTBL_P4D_MODIFIED) +#define PGTBL_PUD_MODIFIED BIT(__PGTBL_PUD_MODIFIED) +#define PGTBL_PMD_MODIFIED BIT(__PGTBL_PMD_MODIFIED) +#define PGTBL_PTE_MODIFIED BIT(__PGTBL_PTE_MODIFIED) + +/* Page-Table Modification Mask */ +typedef unsigned int pgtbl_mod_mask; + +#endif /* !__ASSEMBLY__ */ + +#endif /* _LINUX_PGTABLE_MODMASK_H */ diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 16dd4cba64f2..cb5d8f1965a1 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -11,6 +11,7 @@ #include /* pgprot_t */ #include #include +#include =20 #include =20 @@ -213,6 +214,26 @@ extern int remap_vmalloc_range(struct vm_area_struct *= vma, void *addr, int vmap_pages_range(unsigned long addr, unsigned long end, pgprot_t prot, struct page **pages, unsigned int page_shift); =20 +#ifndef arch_update_kernel_mappings_begin +/** + * arch_update_kernel_mappings_begin - A batch of kernel pgtable mappings = are + * about to be updated. + * @start: Virtual address of start of range to be updated. + * @end: Virtual address of end of range to be updated. + * + * An optional hook to allow architecture code to prepare for a batch of k= ernel + * pgtable mapping updates. An architecture may use this to enter a lazy m= ode + * where some operations can be deferred until the end of the batch. + * + * Context: Called in task context and may be preemptible. + */ +static inline void arch_update_kernel_mappings_begin(unsigned long start, + unsigned long end) +{ +} +#endif + +#ifndef arch_update_kernel_mappings_end /* * Architectures can set this mask to a combination of PGTBL_P?D_MODIFIED = values * and let generic vmalloc and ioremap code know when arch_sync_kernel_map= pings() @@ -229,6 +250,32 @@ int vmap_pages_range(unsigned long addr, unsigned long= end, pgprot_t prot, */ void arch_sync_kernel_mappings(unsigned long start, unsigned long end); =20 +/** + * arch_update_kernel_mappings_end - A batch of kernel pgtable mappings ha= ve + * been updated. + * @start: Virtual address of start of range that was updated. + * @end: Virtual address of end of range that was updated. + * + * An optional hook to inform architecture code that a batch update is com= plete. + * This balances a previous call to arch_update_kernel_mappings_begin(). + * + * An architecture may override this for any purpose, such as exiting a la= zy + * mode previously entered with arch_update_kernel_mappings_begin() or syn= cing + * kernel mappings to a secondary pgtable. The default implementation call= s an + * arch-provided arch_sync_kernel_mappings() if any arch-defined pgtable l= evel + * was updated. + * + * Context: Called in task context and may be preemptible. + */ +static inline void arch_update_kernel_mappings_end(unsigned long start, + unsigned long end, + pgtbl_mod_mask mask) +{ + if (mask & ARCH_PAGE_TABLE_SYNC_MASK) + arch_sync_kernel_mappings(start, end); +} +#endif + /* * Lowlevel-APIs (not for driver use!) */ diff --git a/mm/memory.c b/mm/memory.c index a15f7dd500ea..f80930bc19f6 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3035,6 +3035,8 @@ static int __apply_to_page_range(struct mm_struct *mm= , unsigned long addr, if (WARN_ON(addr >=3D end)) return -EINVAL; =20 + arch_update_kernel_mappings_begin(start, end); + pgd =3D pgd_offset(mm, addr); do { next =3D pgd_addr_end(addr, end); @@ -3055,8 +3057,7 @@ static int __apply_to_page_range(struct mm_struct *mm= , unsigned long addr, break; } while (pgd++, addr =3D next, addr !=3D end); =20 - if (mask & ARCH_PAGE_TABLE_SYNC_MASK) - arch_sync_kernel_mappings(start, start + size); + arch_update_kernel_mappings_end(start, end, mask); =20 return err; } diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 50fd44439875..c5c51d86ef78 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -312,10 +312,10 @@ int vmap_page_range(unsigned long addr, unsigned long= end, pgtbl_mod_mask mask =3D 0; int err; =20 + arch_update_kernel_mappings_begin(addr, end); err =3D vmap_range_noflush(addr, end, phys_addr, pgprot_nx(prot), ioremap_max_page_shift, &mask); - if (mask & ARCH_PAGE_TABLE_SYNC_MASK) - arch_sync_kernel_mappings(addr, end); + arch_update_kernel_mappings_end(addr, end, mask); =20 flush_cache_vmap(addr, end); if (!err) @@ -463,6 +463,9 @@ void __vunmap_range_noflush(unsigned long start, unsign= ed long end) pgtbl_mod_mask mask =3D 0; =20 BUG_ON(addr >=3D end); + + arch_update_kernel_mappings_begin(start, end); + pgd =3D pgd_offset_k(addr); do { next =3D pgd_addr_end(addr, end); @@ -473,8 +476,7 @@ void __vunmap_range_noflush(unsigned long start, unsign= ed long end) vunmap_p4d_range(pgd, addr, next, &mask); } while (pgd++, addr =3D next, addr !=3D end); =20 - if (mask & ARCH_PAGE_TABLE_SYNC_MASK) - arch_sync_kernel_mappings(start, end); + arch_update_kernel_mappings_end(start, end, mask); } =20 void vunmap_range_noflush(unsigned long start, unsigned long end) @@ -625,6 +627,8 @@ int __vmap_pages_range_noflush(unsigned long addr, unsi= gned long end, =20 WARN_ON(page_shift < PAGE_SHIFT); =20 + arch_update_kernel_mappings_begin(start, end); + if (!IS_ENABLED(CONFIG_HAVE_ARCH_HUGE_VMALLOC) || page_shift =3D=3D PAGE_SHIFT) { err =3D vmap_small_pages_range_noflush(addr, end, prot, pages, @@ -642,8 +646,7 @@ int __vmap_pages_range_noflush(unsigned long addr, unsi= gned long end, } } =20 - if (mask & ARCH_PAGE_TABLE_SYNC_MASK) - arch_sync_kernel_mappings(start, end); + arch_update_kernel_mappings_end(start, end, mask); =20 return err; } --=20 2.43.0 From nobody Sun Dec 14 21:33:56 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id DA7E11FC11F for ; Wed, 5 Feb 2025 15:11:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738768274; cv=none; b=E6BEdEWS5hbzTRLR7dKWnX3QH8Mm0KY75mwT9HuDn1KWk4UROXRIi/PHzPKsGkISgb9MXwFQIsA+hxM0KznWYCbG+KW15oc2Irvq2SLOUprCKC1/77TYz9tpXIuoyTjcasTbajbfGeR6o98MQ7GfQQtvyhheJ2C+HxzV3JOvADc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1738768274; c=relaxed/simple; bh=W6SvAOoym1M5wdIH4dVoO/jYRg3H1it2TklUjym4gEE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=L9/5CxeyFPJH/eGWgnc8Q6vVEAIcDXXFrFAqZ7Morv0mKHxgozJLEpIgSFJi+/v3UO39fDJT3ACbtgQFLEPkGfFAhknt3jKSTh5LizjLmHy3mLOVkn9Ys/8AUmj2P40pgy0uDdV9CUmFd/bV78QC3kY6QKIPjORRRx8RtVkH20M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id D55E91063; Wed, 5 Feb 2025 07:11:35 -0800 (PST) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B93063F5A1; Wed, 5 Feb 2025 07:11:09 -0800 (PST) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Muchun Song , Pasha Tatashin , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , Mark Rutland , Ard Biesheuvel , Anshuman Khandual , Dev Jain , Alexandre Ghiti , Steve Capper , Kevin Brodsky Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v1 16/16] arm64/mm: Defer barriers when updating kernel mappings Date: Wed, 5 Feb 2025 15:09:56 +0000 Message-ID: <20250205151003.88959-17-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250205151003.88959-1-ryan.roberts@arm.com> References: <20250205151003.88959-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Because the kernel can't tolerate page faults for kernel mappings, when setting a valid, kernel space pte (or pmd/pud/p4d/pgd), it emits a dsb(ishst) to ensure that the store to the pgtable is observed by the table walker immediately. Additionally it emits an isb() to ensure that any already speculatively determined invalid mapping fault gets canceled. We can improve the performance of vmalloc operations by batching these barriers until the end of a set up entry updates. The newly added arch_update_kernel_mappings_begin() / arch_update_kernel_mappings_end() provide the required hooks. vmalloc improves by up to 30% as a result. Two new TIF_ flags are created; TIF_KMAP_UPDATE_ACTIVE tells us if we are in the batch mode and can therefore defer any barriers until the end of the batch. TIF_KMAP_UPDATE_PENDING tells us if barriers are queued to be emited at the end of the batch. Signed-off-by: Ryan Roberts --- arch/arm64/include/asm/pgtable.h | 65 +++++++++++++++++++--------- arch/arm64/include/asm/thread_info.h | 2 + arch/arm64/kernel/process.c | 20 +++++++-- 3 files changed, 63 insertions(+), 24 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgta= ble.h index ff358d983583..1ee9b9588502 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -39,6 +39,41 @@ #include #include #include +#include + +static inline void emit_pte_barriers(void) +{ + dsb(ishst); + isb(); +} + +static inline void queue_pte_barriers(void) +{ + if (test_thread_flag(TIF_KMAP_UPDATE_ACTIVE)) { + if (!test_thread_flag(TIF_KMAP_UPDATE_PENDING)) + set_thread_flag(TIF_KMAP_UPDATE_PENDING); + } else + emit_pte_barriers(); +} + +#define arch_update_kernel_mappings_begin arch_update_kernel_mappings_begin +static inline void arch_update_kernel_mappings_begin(unsigned long start, + unsigned long end) +{ + set_thread_flag(TIF_KMAP_UPDATE_ACTIVE); +} + +#define arch_update_kernel_mappings_end arch_update_kernel_mappings_end +static inline void arch_update_kernel_mappings_end(unsigned long start, + unsigned long end, + pgtbl_mod_mask mask) +{ + if (test_thread_flag(TIF_KMAP_UPDATE_PENDING)) + emit_pte_barriers(); + + clear_thread_flag(TIF_KMAP_UPDATE_PENDING); + clear_thread_flag(TIF_KMAP_UPDATE_ACTIVE); +} =20 #ifdef CONFIG_TRANSPARENT_HUGEPAGE #define __HAVE_ARCH_FLUSH_PMD_TLB_RANGE @@ -323,10 +358,8 @@ static inline void __set_pte_complete(pte_t pte) * Only if the new pte is valid and kernel, otherwise TLB maintenance * or update_mmu_cache() have the necessary barriers. */ - if (pte_valid_not_user(pte)) { - dsb(ishst); - isb(); - } + if (pte_valid_not_user(pte)) + queue_pte_barriers(); } =20 static inline void __set_pte(pte_t *ptep, pte_t pte) @@ -791,10 +824,8 @@ static inline void set_pmd(pmd_t *pmdp, pmd_t pmd) =20 WRITE_ONCE(*pmdp, pmd); =20 - if (pmd_valid_not_user(pmd)) { - dsb(ishst); - isb(); - } + if (pmd_valid_not_user(pmd)) + queue_pte_barriers(); } =20 static inline void pmd_clear(pmd_t *pmdp) @@ -869,10 +900,8 @@ static inline void set_pud(pud_t *pudp, pud_t pud) =20 WRITE_ONCE(*pudp, pud); =20 - if (pud_valid_not_user(pud)) { - dsb(ishst); - isb(); - } + if (pud_valid_not_user(pud)) + queue_pte_barriers(); } =20 static inline void pud_clear(pud_t *pudp) @@ -960,10 +989,8 @@ static inline void set_p4d(p4d_t *p4dp, p4d_t p4d) =20 WRITE_ONCE(*p4dp, p4d); =20 - if (p4d_valid_not_user(p4d)) { - dsb(ishst); - isb(); - } + if (p4d_valid_not_user(p4d)) + queue_pte_barriers(); } =20 static inline void p4d_clear(p4d_t *p4dp) @@ -1098,10 +1125,8 @@ static inline void set_pgd(pgd_t *pgdp, pgd_t pgd) =20 WRITE_ONCE(*pgdp, pgd); =20 - if (pgd_valid_not_user(pgd)) { - dsb(ishst); - isb(); - } + if (pgd_valid_not_user(pgd)) + queue_pte_barriers(); } =20 static inline void pgd_clear(pgd_t *pgdp) diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/= thread_info.h index 1114c1c3300a..382d2121261e 100644 --- a/arch/arm64/include/asm/thread_info.h +++ b/arch/arm64/include/asm/thread_info.h @@ -82,6 +82,8 @@ void arch_setup_new_exec(void); #define TIF_SME_VL_INHERIT 28 /* Inherit SME vl_onexec across exec */ #define TIF_KERNEL_FPSTATE 29 /* Task is in a kernel mode FPSIMD section */ #define TIF_TSC_SIGSEGV 30 /* SIGSEGV on counter-timer access */ +#define TIF_KMAP_UPDATE_ACTIVE 31 /* kernel map update in progress */ +#define TIF_KMAP_UPDATE_PENDING 32 /* kernel map updated with deferred bar= riers */ =20 #define _TIF_SIGPENDING (1 << TIF_SIGPENDING) #define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED) diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index 42faebb7b712..1367ec6407d1 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -680,10 +680,10 @@ struct task_struct *__switch_to(struct task_struct *p= rev, gcs_thread_switch(next); =20 /* - * Complete any pending TLB or cache maintenance on this CPU in case - * the thread migrates to a different CPU. - * This full barrier is also required by the membarrier system - * call. + * Complete any pending TLB or cache maintenance on this CPU in case the + * thread migrates to a different CPU. This full barrier is also + * required by the membarrier system call. Additionally it is required + * for TIF_KMAP_UPDATE_PENDING, see below. */ dsb(ish); =20 @@ -696,6 +696,18 @@ struct task_struct *__switch_to(struct task_struct *pr= ev, /* avoid expensive SCTLR_EL1 accesses if no change */ if (prev->thread.sctlr_user !=3D next->thread.sctlr_user) update_sctlr_el1(next->thread.sctlr_user); + else if (unlikely(test_thread_flag(TIF_KMAP_UPDATE_PENDING))) { + /* + * In unlikely event that a kernel map update is on-going when + * preemption occurs, we must emit_pte_barriers() if pending. + * emit_pte_barriers() consists of "dsb(ishst); isb();". The dsb + * is already handled above. The isb() is handled if + * update_sctlr_el1() was called. So only need to emit isb() + * here if it wasn't called. + */ + isb(); + clear_thread_flag(TIF_KMAP_UPDATE_PENDING); + } =20 /* the actual thread switch */ last =3D cpu_switch_to(prev, next); --=20 2.43.0