From nobody Fri Dec 19 13:50:35 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id DABD91EBA16 for ; Tue, 22 Apr 2025 08:18:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745309925; cv=none; b=ZLLx0/wNc912U0ZkoX4nP0IYEeAFtqfPD5TAMlXOWbVObeYFbaxoq3iQLWDqDIcIR2fxEm4lrdUA6SBXk+9gaKkL9cVPXExK4slDzehPFZHoVKjayXdJ5Chk81qL+1Pi5RZPlDzcsE4iVeuDGlO6pYaaWNoXz6qwNW7rwPFkO2c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745309925; c=relaxed/simple; bh=E8D3cSLQeYoyZ/dgADsYWqeVLa+kpZ+5c5Oicqh2HQ4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JNct7OUIDY9WFPeRjSbY+zqDPfVpvMCrhnf37WK76iEaiP5P40M8yemIvPfRDqcgK19D5yqYJoHOWe+fuUy/0TLmiztxeHCOQyqa9TsSagwGzIgqTZkmc4XTdkJhg6j0tRhin7DLdbI9gh9U0G6ifIDqOtOw5MbQpK/7aw/rqb4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E5D9F176A; Tue, 22 Apr 2025 01:18:38 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 3F1B13F66E; Tue, 22 Apr 2025 01:18:41 -0700 (PDT) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Pasha Tatashin , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , David Hildenbrand , "Matthew Wilcox (Oracle)" , Mark Rutland , Anshuman Khandual , Alexandre Ghiti , Kevin Brodsky Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v4 01/11] arm64: hugetlb: Cleanup huge_pte size discovery mechanisms Date: Tue, 22 Apr 2025 09:18:09 +0100 Message-ID: <20250422081822.1836315-2-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250422081822.1836315-1-ryan.roberts@arm.com> References: <20250422081822.1836315-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Not all huge_pte helper APIs explicitly provide the size of the huge_pte. So the helpers have to depend on various methods to determine the size of the huge_pte. Some of these methods are dubious. Let's clean up the code to use preferred methods and retire the dubious ones. The options in order of preference: - If size is provided as parameter, use it together with num_contig_ptes(). This is explicit and works for both present and non-present ptes. - If vma is provided as a parameter, retrieve size via huge_page_size(hstate_vma(vma)) and use it together with num_contig_ptes(). This is explicit and works for both present and non-present ptes. - If the pte is present and contiguous, use find_num_contig() to walk the pgtable to find the level and infer the number of ptes from level. Only works for *present* ptes. - If the pte is present and not contiguous and you can infer from this that only 1 pte needs to be operated on. This is ok if you don't care about the absolute size, and just want to know the number of ptes. - NEVER rely on resolving the PFN of a present pte to a folio and getting the folio's size. This is fragile at best, because there is nothing to stop the core-mm from allocating a folio twice as big as the huge_pte then mapping it across 2 consecutive huge_ptes. Or just partially mapping it. Where we require that the pte is present, add warnings if not-present. Reviewed-by: Catalin Marinas Reviewed-by: Anshuman Khandual Signed-off-by: Ryan Roberts Tested-by: Luiz Capitulino --- arch/arm64/mm/hugetlbpage.c | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index cfe8cb8ba1cc..701394aa7734 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -129,7 +129,7 @@ pte_t huge_ptep_get(struct mm_struct *mm, unsigned long= addr, pte_t *ptep) if (!pte_present(orig_pte) || !pte_cont(orig_pte)) return orig_pte; =20 - ncontig =3D num_contig_ptes(page_size(pte_page(orig_pte)), &pgsize); + ncontig =3D find_num_contig(mm, addr, ptep, &pgsize); for (i =3D 0; i < ncontig; i++, ptep++) { pte_t pte =3D __ptep_get(ptep); =20 @@ -438,16 +438,19 @@ int huge_ptep_set_access_flags(struct vm_area_struct = *vma, pgprot_t hugeprot; pte_t orig_pte; =20 + VM_WARN_ON(!pte_present(pte)); + if (!pte_cont(pte)) return __ptep_set_access_flags(vma, addr, ptep, pte, dirty); =20 - ncontig =3D find_num_contig(mm, addr, ptep, &pgsize); + ncontig =3D num_contig_ptes(huge_page_size(hstate_vma(vma)), &pgsize); dpfn =3D pgsize >> PAGE_SHIFT; =20 if (!__cont_access_flags_changed(ptep, pte, ncontig)) return 0; =20 orig_pte =3D get_clear_contig_flush(mm, addr, ptep, pgsize, ncontig); + VM_WARN_ON(!pte_present(orig_pte)); =20 /* Make sure we don't lose the dirty or young state */ if (pte_dirty(orig_pte)) @@ -472,7 +475,10 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm, size_t pgsize; pte_t pte; =20 - if (!pte_cont(__ptep_get(ptep))) { + pte =3D __ptep_get(ptep); + VM_WARN_ON(!pte_present(pte)); + + if (!pte_cont(pte)) { __ptep_set_wrprotect(mm, addr, ptep); return; } @@ -496,11 +502,15 @@ pte_t huge_ptep_clear_flush(struct vm_area_struct *vm= a, struct mm_struct *mm =3D vma->vm_mm; size_t pgsize; int ncontig; + pte_t pte; + + pte =3D __ptep_get(ptep); + VM_WARN_ON(!pte_present(pte)); =20 - if (!pte_cont(__ptep_get(ptep))) + if (!pte_cont(pte)) return ptep_clear_flush(vma, addr, ptep); =20 - ncontig =3D find_num_contig(mm, addr, ptep, &pgsize); + ncontig =3D num_contig_ptes(huge_page_size(hstate_vma(vma)), &pgsize); return get_clear_contig_flush(mm, addr, ptep, pgsize, ncontig); } =20 --=20 2.43.0 From nobody Fri Dec 19 13:50:35 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 224F124BBFD for ; Tue, 22 Apr 2025 08:18:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745309927; cv=none; b=kMxELGMHzZ+nzNHijgSDM+uC8uqGby8LPQUiGmJ2W9E3ow8q+Byj0vVFO4h/s93Ml1GRM1EzPvnd0HOEkoNzvD10T4wEvhKOHEQMa3rpMUDZaDltQFqBL4B7zDYCdG0JAjavl3J4D9TiaMFWA2191OjziR9/7aTiXDG4nVIoEJ4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745309927; c=relaxed/simple; bh=vFNGRtKKIlmtYqKLwxd4gM4ED6A9LG6+Fx8lrJMUzGo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Y5U32cQ7QmbfHA5bd5EsUUXT21x3DIdXPamwot5u5fKgAHgaykQdkEif/ZOz4nQqcajPxietiQbQtdlmgMvayaNkxwQJRdZ2TCCj0Irb6CL7sInFxtXz/mVlUXo1xTFCBzH3FZoOsJU2seCg6Lu9he8rxXlJwdvFDhJ4JYwQR58= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 1887319F0; Tue, 22 Apr 2025 01:18:41 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 6517D3F66E; Tue, 22 Apr 2025 01:18:43 -0700 (PDT) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Pasha Tatashin , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , David Hildenbrand , "Matthew Wilcox (Oracle)" , Mark Rutland , Anshuman Khandual , Alexandre Ghiti , Kevin Brodsky Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v4 02/11] arm64: hugetlb: Refine tlb maintenance scope Date: Tue, 22 Apr 2025 09:18:10 +0100 Message-ID: <20250422081822.1836315-3-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250422081822.1836315-1-ryan.roberts@arm.com> References: <20250422081822.1836315-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When operating on contiguous blocks of ptes (or pmds) for some hugetlb sizes, we must honour break-before-make requirements and clear down the block to invalid state in the pgtable then invalidate the relevant tlb entries before making the pgtable entries valid again. However, the tlb maintenance is currently always done assuming the worst case stride (PAGE_SIZE), last_level (false) and tlb_level (TLBI_TTL_UNKNOWN). We can do much better with the hinting; In reality, we know the stride from the huge_pte pgsize, we are always operating only on the last level, and we always know the tlb_level, again based on pgsize. So let's start providing these hints. Additionally, avoid tlb maintenace in set_huge_pte_at(). Break-before-make is only required if we are transitioning the contiguous pte block from valid -> valid. So let's elide the clear-and-flush ("break") if the pte range was previously invalid. Reviewed-by: Catalin Marinas Reviewed-by: Anshuman Khandual Signed-off-by: Ryan Roberts Tested-by: Luiz Capitulino --- arch/arm64/include/asm/hugetlb.h | 29 +++++++++++++++++++---------- arch/arm64/mm/hugetlbpage.c | 9 ++++++--- 2 files changed, 25 insertions(+), 13 deletions(-) diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/huge= tlb.h index 07fbf5bf85a7..2a8155c4a882 100644 --- a/arch/arm64/include/asm/hugetlb.h +++ b/arch/arm64/include/asm/hugetlb.h @@ -69,29 +69,38 @@ extern void huge_ptep_modify_prot_commit(struct vm_area= _struct *vma, =20 #include =20 -#define __HAVE_ARCH_FLUSH_HUGETLB_TLB_RANGE -static inline void flush_hugetlb_tlb_range(struct vm_area_struct *vma, - unsigned long start, - unsigned long end) +static inline void __flush_hugetlb_tlb_range(struct vm_area_struct *vma, + unsigned long start, + unsigned long end, + unsigned long stride, + bool last_level) { - unsigned long stride =3D huge_page_size(hstate_vma(vma)); - switch (stride) { #ifndef __PAGETABLE_PMD_FOLDED case PUD_SIZE: - __flush_tlb_range(vma, start, end, PUD_SIZE, false, 1); + __flush_tlb_range(vma, start, end, PUD_SIZE, last_level, 1); break; #endif case CONT_PMD_SIZE: case PMD_SIZE: - __flush_tlb_range(vma, start, end, PMD_SIZE, false, 2); + __flush_tlb_range(vma, start, end, PMD_SIZE, last_level, 2); break; case CONT_PTE_SIZE: - __flush_tlb_range(vma, start, end, PAGE_SIZE, false, 3); + __flush_tlb_range(vma, start, end, PAGE_SIZE, last_level, 3); break; default: - __flush_tlb_range(vma, start, end, PAGE_SIZE, false, TLBI_TTL_UNKNOWN); + __flush_tlb_range(vma, start, end, PAGE_SIZE, last_level, TLBI_TTL_UNKNO= WN); } } =20 +#define __HAVE_ARCH_FLUSH_HUGETLB_TLB_RANGE +static inline void flush_hugetlb_tlb_range(struct vm_area_struct *vma, + unsigned long start, + unsigned long end) +{ + unsigned long stride =3D huge_page_size(hstate_vma(vma)); + + __flush_hugetlb_tlb_range(vma, start, end, stride, false); +} + #endif /* __ASM_HUGETLB_H */ diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 701394aa7734..087fc43381c6 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -183,8 +183,9 @@ static pte_t get_clear_contig_flush(struct mm_struct *m= m, { pte_t orig_pte =3D get_clear_contig(mm, addr, ptep, pgsize, ncontig); struct vm_area_struct vma =3D TLB_FLUSH_VMA(mm, 0); + unsigned long end =3D addr + (pgsize * ncontig); =20 - flush_tlb_range(&vma, addr, addr + (pgsize * ncontig)); + __flush_hugetlb_tlb_range(&vma, addr, end, pgsize, true); return orig_pte; } =20 @@ -209,7 +210,7 @@ static void clear_flush(struct mm_struct *mm, for (i =3D 0; i < ncontig; i++, addr +=3D pgsize, ptep++) __ptep_get_and_clear(mm, addr, ptep); =20 - flush_tlb_range(&vma, saddr, addr); + __flush_hugetlb_tlb_range(&vma, saddr, addr, pgsize, true); } =20 void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, @@ -238,7 +239,9 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned lon= g addr, dpfn =3D pgsize >> PAGE_SHIFT; hugeprot =3D pte_pgprot(pte); =20 - clear_flush(mm, addr, ptep, pgsize, ncontig); + /* Only need to "break" if transitioning valid -> valid. */ + if (pte_valid(__ptep_get(ptep))) + clear_flush(mm, addr, ptep, pgsize, ncontig); =20 for (i =3D 0; i < ncontig; i++, ptep++, addr +=3D pgsize, pfn +=3D dpfn) __set_ptes(mm, addr, ptep, pfn_pte(pfn, hugeprot), 1); --=20 2.43.0 From nobody Fri Dec 19 13:50:35 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 1C6B8253B76 for ; Tue, 22 Apr 2025 08:18:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745309929; cv=none; b=uodAvu4f5pa2KnmlXTzIZiS7wqpz+wn36DgPCzVz8uX8IOjHgBeOCfVqNhh0zmTsVQ8kpJ3aWoTVzik03DYtVOEWTJs5JGGdFO00Gom5YIClH2DTaVu/OtYx+fq2YfN4CHRdsF80wT37IAyFxs3+9x2pHVlX0/tKMeAQY/htdmk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745309929; c=relaxed/simple; bh=WtFgLzjE6Xt+IMvTJdo2SRZY1dpM1xtz/1BKjqzINxk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=JD/aeAC46PCGFWMihJ/liuX5jfDyxUQBsWCP91S4GFXjyrN5xlilyiJqkRTRVpE5PtaVuJIYVIdBrOvIfP6cTq4kjMlp3Hd86IuFHcY6Ka8MsElBY33/EclV+jg9UlSElkJvPG4evjElZdkbFtwa8Cw7NVD225qCDH6FzIzzcG0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 3EC931EDB; Tue, 22 Apr 2025 01:18:43 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 8BB513F66E; Tue, 22 Apr 2025 01:18:45 -0700 (PDT) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Pasha Tatashin , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , David Hildenbrand , "Matthew Wilcox (Oracle)" , Mark Rutland , Anshuman Khandual , Alexandre Ghiti , Kevin Brodsky Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v4 03/11] mm/page_table_check: Batch-check pmds/puds just like ptes Date: Tue, 22 Apr 2025 09:18:11 +0100 Message-ID: <20250422081822.1836315-4-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250422081822.1836315-1-ryan.roberts@arm.com> References: <20250422081822.1836315-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Convert page_table_check_p[mu]d_set(...) to page_table_check_p[mu]ds_set(..., nr) to allow checking a contiguous set of pmds/puds in single batch. We retain page_table_check_p[mu]d_set(...) as macros that call new batch functions with nr=3D1 for compatibility. arm64 is about to reorganise its pte/pmd/pud helpers to reuse more code and to allow the implementation for huge_pte to more efficiently set ptes/pmds/puds in batches. We need these batch-helpers to make the refactoring possible. Reviewed-by: Anshuman Khandual Reviewed-by: Pasha Tatashin Reviewed-by: Catalin Marinas Signed-off-by: Ryan Roberts Tested-by: Luiz Capitulino --- include/linux/page_table_check.h | 30 +++++++++++++++++----------- mm/page_table_check.c | 34 +++++++++++++++++++------------- 2 files changed, 38 insertions(+), 26 deletions(-) diff --git a/include/linux/page_table_check.h b/include/linux/page_table_ch= eck.h index 6722941c7cb8..289620d4aad3 100644 --- a/include/linux/page_table_check.h +++ b/include/linux/page_table_check.h @@ -19,8 +19,10 @@ void __page_table_check_pmd_clear(struct mm_struct *mm, = pmd_t pmd); void __page_table_check_pud_clear(struct mm_struct *mm, pud_t pud); void __page_table_check_ptes_set(struct mm_struct *mm, pte_t *ptep, pte_t = pte, unsigned int nr); -void __page_table_check_pmd_set(struct mm_struct *mm, pmd_t *pmdp, pmd_t p= md); -void __page_table_check_pud_set(struct mm_struct *mm, pud_t *pudp, pud_t p= ud); +void __page_table_check_pmds_set(struct mm_struct *mm, pmd_t *pmdp, pmd_t = pmd, + unsigned int nr); +void __page_table_check_puds_set(struct mm_struct *mm, pud_t *pudp, pud_t = pud, + unsigned int nr); void __page_table_check_pte_clear_range(struct mm_struct *mm, unsigned long addr, pmd_t pmd); @@ -74,22 +76,22 @@ static inline void page_table_check_ptes_set(struct mm_= struct *mm, __page_table_check_ptes_set(mm, ptep, pte, nr); } =20 -static inline void page_table_check_pmd_set(struct mm_struct *mm, pmd_t *p= mdp, - pmd_t pmd) +static inline void page_table_check_pmds_set(struct mm_struct *mm, + pmd_t *pmdp, pmd_t pmd, unsigned int nr) { if (static_branch_likely(&page_table_check_disabled)) return; =20 - __page_table_check_pmd_set(mm, pmdp, pmd); + __page_table_check_pmds_set(mm, pmdp, pmd, nr); } =20 -static inline void page_table_check_pud_set(struct mm_struct *mm, pud_t *p= udp, - pud_t pud) +static inline void page_table_check_puds_set(struct mm_struct *mm, + pud_t *pudp, pud_t pud, unsigned int nr) { if (static_branch_likely(&page_table_check_disabled)) return; =20 - __page_table_check_pud_set(mm, pudp, pud); + __page_table_check_puds_set(mm, pudp, pud, nr); } =20 static inline void page_table_check_pte_clear_range(struct mm_struct *mm, @@ -129,13 +131,13 @@ static inline void page_table_check_ptes_set(struct m= m_struct *mm, { } =20 -static inline void page_table_check_pmd_set(struct mm_struct *mm, pmd_t *p= mdp, - pmd_t pmd) +static inline void page_table_check_pmds_set(struct mm_struct *mm, + pmd_t *pmdp, pmd_t pmd, unsigned int nr) { } =20 -static inline void page_table_check_pud_set(struct mm_struct *mm, pud_t *p= udp, - pud_t pud) +static inline void page_table_check_puds_set(struct mm_struct *mm, + pud_t *pudp, pud_t pud, unsigned int nr) { } =20 @@ -146,4 +148,8 @@ static inline void page_table_check_pte_clear_range(str= uct mm_struct *mm, } =20 #endif /* CONFIG_PAGE_TABLE_CHECK */ + +#define page_table_check_pmd_set(mm, pmdp, pmd) page_table_check_pmds_set(= mm, pmdp, pmd, 1) +#define page_table_check_pud_set(mm, pudp, pud) page_table_check_puds_set(= mm, pudp, pud, 1) + #endif /* __LINUX_PAGE_TABLE_CHECK_H */ diff --git a/mm/page_table_check.c b/mm/page_table_check.c index 68109ee93841..4eeca782b888 100644 --- a/mm/page_table_check.c +++ b/mm/page_table_check.c @@ -218,33 +218,39 @@ static inline void page_table_check_pmd_flags(pmd_t p= md) WARN_ON_ONCE(swap_cached_writable(pmd_to_swp_entry(pmd))); } =20 -void __page_table_check_pmd_set(struct mm_struct *mm, pmd_t *pmdp, pmd_t p= md) +void __page_table_check_pmds_set(struct mm_struct *mm, pmd_t *pmdp, pmd_t = pmd, + unsigned int nr) { + unsigned long stride =3D PMD_SIZE >> PAGE_SHIFT; + unsigned int i; + if (&init_mm =3D=3D mm) return; =20 page_table_check_pmd_flags(pmd); =20 - __page_table_check_pmd_clear(mm, *pmdp); - if (pmd_user_accessible_page(pmd)) { - page_table_check_set(pmd_pfn(pmd), PMD_SIZE >> PAGE_SHIFT, - pmd_write(pmd)); - } + for (i =3D 0; i < nr; i++) + __page_table_check_pmd_clear(mm, *(pmdp + i)); + if (pmd_user_accessible_page(pmd)) + page_table_check_set(pmd_pfn(pmd), stride * nr, pmd_write(pmd)); } -EXPORT_SYMBOL(__page_table_check_pmd_set); +EXPORT_SYMBOL(__page_table_check_pmds_set); =20 -void __page_table_check_pud_set(struct mm_struct *mm, pud_t *pudp, pud_t p= ud) +void __page_table_check_puds_set(struct mm_struct *mm, pud_t *pudp, pud_t = pud, + unsigned int nr) { + unsigned long stride =3D PUD_SIZE >> PAGE_SHIFT; + unsigned int i; + if (&init_mm =3D=3D mm) return; =20 - __page_table_check_pud_clear(mm, *pudp); - if (pud_user_accessible_page(pud)) { - page_table_check_set(pud_pfn(pud), PUD_SIZE >> PAGE_SHIFT, - pud_write(pud)); - } + for (i =3D 0; i < nr; i++) + __page_table_check_pud_clear(mm, *(pudp + i)); + if (pud_user_accessible_page(pud)) + page_table_check_set(pud_pfn(pud), stride * nr, pud_write(pud)); } -EXPORT_SYMBOL(__page_table_check_pud_set); +EXPORT_SYMBOL(__page_table_check_puds_set); =20 void __page_table_check_pte_clear_range(struct mm_struct *mm, unsigned long addr, --=20 2.43.0 From nobody Fri Dec 19 13:50:35 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 59481254AF5 for ; Tue, 22 Apr 2025 08:18:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745309931; cv=none; b=gq4aKT4JJsn2tmEJAJAwX3EgO93gxdmGYKgVb/f4L6Xe0kgdBJ/O+Tez1kUbmPJjXq2YjlWI8eaM+U0/aUfUC8811Q6yvs3mtcJw7WvFmK6J9K8Pn3dvprFk7wwWOp3xriVT2E+jTZnSqvqD18MiDZf0PccGvUR2rUPvAjTD/fo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745309931; c=relaxed/simple; bh=/4ELG4ZleY5UZH+tsAHy8HCFxbH6cFjWzaGYUoFe4Rg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=M28pT15euq2xqTMGM0BIO4URiC6WncAD+NtRpzMg902IJ6hLy2fRW5EDxbROzj+kditdTQMofaToWgDaYuWFFe90yUvQFIW1q0sqpqdWovbO0m5loVy2aAmqbD7EJH10Pxvpj+vjdkzUU8cPbwFuQAFbIp6wb3sfYSnvJLdSHus= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 64F7A152B; Tue, 22 Apr 2025 01:18:45 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B1F893F66E; Tue, 22 Apr 2025 01:18:47 -0700 (PDT) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Pasha Tatashin , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , David Hildenbrand , "Matthew Wilcox (Oracle)" , Mark Rutland , Anshuman Khandual , Alexandre Ghiti , Kevin Brodsky Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v4 04/11] arm64/mm: Refactor __set_ptes() and __ptep_get_and_clear() Date: Tue, 22 Apr 2025 09:18:12 +0100 Message-ID: <20250422081822.1836315-5-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250422081822.1836315-1-ryan.roberts@arm.com> References: <20250422081822.1836315-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Refactor __set_ptes(), set_pmd_at() and set_pud_at() so that they are all a thin wrapper around a new common __set_ptes_anysz(), which takes pgsize parameter. Additionally, refactor __ptep_get_and_clear() and pmdp_huge_get_and_clear() to use a new common __ptep_get_and_clear_anysz() which also takes a pgsize parameter. These changes will permit the huge_pte API to efficiently batch-set pgtable entries and take advantage of the future barrier optimizations. Additionally since the new *_anysz() helpers call the correct page_table_check_*_set() API based on pgsize, this means that huge_ptes will be able to get proper coverage. Currently the huge_pte API always uses the pte API which assumes an entry only covers a single page. Reviewed-by: Catalin Marinas Signed-off-by: Ryan Roberts Reviewed-by: Anshuman Khandual Tested-by: Luiz Capitulino --- arch/arm64/include/asm/pgtable.h | 114 ++++++++++++++++++++----------- 1 file changed, 73 insertions(+), 41 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgta= ble.h index d3b538be1500..d80aa9ba0a16 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -423,23 +423,6 @@ static inline pte_t pte_advance_pfn(pte_t pte, unsigne= d long nr) return pfn_pte(pte_pfn(pte) + nr, pte_pgprot(pte)); } =20 -static inline void __set_ptes(struct mm_struct *mm, - unsigned long __always_unused addr, - pte_t *ptep, pte_t pte, unsigned int nr) -{ - page_table_check_ptes_set(mm, ptep, pte, nr); - __sync_cache_and_tags(pte, nr); - - for (;;) { - __check_safe_pte_update(mm, ptep, pte); - __set_pte(ptep, pte); - if (--nr =3D=3D 0) - break; - ptep++; - pte =3D pte_advance_pfn(pte, 1); - } -} - /* * Hugetlb definitions. */ @@ -649,30 +632,62 @@ static inline pgprot_t pud_pgprot(pud_t pud) return __pgprot(pud_val(pfn_pud(pfn, __pgprot(0))) ^ pud_val(pud)); } =20 -static inline void __set_pte_at(struct mm_struct *mm, - unsigned long __always_unused addr, - pte_t *ptep, pte_t pte, unsigned int nr) +static inline void __set_ptes_anysz(struct mm_struct *mm, pte_t *ptep, + pte_t pte, unsigned int nr, + unsigned long pgsize) { - __sync_cache_and_tags(pte, nr); - __check_safe_pte_update(mm, ptep, pte); - __set_pte(ptep, pte); + unsigned long stride =3D pgsize >> PAGE_SHIFT; + + switch (pgsize) { + case PAGE_SIZE: + page_table_check_ptes_set(mm, ptep, pte, nr); + break; + case PMD_SIZE: + page_table_check_pmds_set(mm, (pmd_t *)ptep, pte_pmd(pte), nr); + break; +#ifndef __PAGETABLE_PMD_FOLDED + case PUD_SIZE: + page_table_check_puds_set(mm, (pud_t *)ptep, pte_pud(pte), nr); + break; +#endif + default: + VM_WARN_ON(1); + } + + __sync_cache_and_tags(pte, nr * stride); + + for (;;) { + __check_safe_pte_update(mm, ptep, pte); + __set_pte(ptep, pte); + if (--nr =3D=3D 0) + break; + ptep++; + pte =3D pte_advance_pfn(pte, stride); + } +} + +static inline void __set_ptes(struct mm_struct *mm, + unsigned long __always_unused addr, + pte_t *ptep, pte_t pte, unsigned int nr) +{ + __set_ptes_anysz(mm, ptep, pte, nr, PAGE_SIZE); } =20 -static inline void set_pmd_at(struct mm_struct *mm, unsigned long addr, - pmd_t *pmdp, pmd_t pmd) +static inline void __set_pmds(struct mm_struct *mm, + unsigned long __always_unused addr, + pmd_t *pmdp, pmd_t pmd, unsigned int nr) { - page_table_check_pmd_set(mm, pmdp, pmd); - return __set_pte_at(mm, addr, (pte_t *)pmdp, pmd_pte(pmd), - PMD_SIZE >> PAGE_SHIFT); + __set_ptes_anysz(mm, (pte_t *)pmdp, pmd_pte(pmd), nr, PMD_SIZE); } +#define set_pmd_at(mm, addr, pmdp, pmd) __set_pmds(mm, addr, pmdp, pmd, 1) =20 -static inline void set_pud_at(struct mm_struct *mm, unsigned long addr, - pud_t *pudp, pud_t pud) +static inline void __set_puds(struct mm_struct *mm, + unsigned long __always_unused addr, + pud_t *pudp, pud_t pud, unsigned int nr) { - page_table_check_pud_set(mm, pudp, pud); - return __set_pte_at(mm, addr, (pte_t *)pudp, pud_pte(pud), - PUD_SIZE >> PAGE_SHIFT); + __set_ptes_anysz(mm, (pte_t *)pudp, pud_pte(pud), nr, PUD_SIZE); } +#define set_pud_at(mm, addr, pudp, pud) __set_puds(mm, addr, pudp, pud, 1) =20 #define __p4d_to_phys(p4d) __pte_to_phys(p4d_pte(p4d)) #define __phys_to_p4d_val(phys) __phys_to_pte_val(phys) @@ -1301,16 +1316,37 @@ static inline int pmdp_test_and_clear_young(struct = vm_area_struct *vma, } #endif /* CONFIG_TRANSPARENT_HUGEPAGE || CONFIG_ARCH_HAS_NONLEAF_PMD_YOUNG= */ =20 -static inline pte_t __ptep_get_and_clear(struct mm_struct *mm, - unsigned long address, pte_t *ptep) +static inline pte_t __ptep_get_and_clear_anysz(struct mm_struct *mm, + pte_t *ptep, + unsigned long pgsize) { pte_t pte =3D __pte(xchg_relaxed(&pte_val(*ptep), 0)); =20 - page_table_check_pte_clear(mm, pte); + switch (pgsize) { + case PAGE_SIZE: + page_table_check_pte_clear(mm, pte); + break; + case PMD_SIZE: + page_table_check_pmd_clear(mm, pte_pmd(pte)); + break; +#ifndef __PAGETABLE_PMD_FOLDED + case PUD_SIZE: + page_table_check_pud_clear(mm, pte_pud(pte)); + break; +#endif + default: + VM_WARN_ON(1); + } =20 return pte; } =20 +static inline pte_t __ptep_get_and_clear(struct mm_struct *mm, + unsigned long address, pte_t *ptep) +{ + return __ptep_get_and_clear_anysz(mm, ptep, PAGE_SIZE); +} + static inline void __clear_full_ptes(struct mm_struct *mm, unsigned long a= ddr, pte_t *ptep, unsigned int nr, int full) { @@ -1347,11 +1383,7 @@ static inline pte_t __get_and_clear_full_ptes(struct= mm_struct *mm, static inline pmd_t pmdp_huge_get_and_clear(struct mm_struct *mm, unsigned long address, pmd_t *pmdp) { - pmd_t pmd =3D __pmd(xchg_relaxed(&pmd_val(*pmdp), 0)); - - page_table_check_pmd_clear(mm, pmd); - - return pmd; + return pte_pmd(__ptep_get_and_clear_anysz(mm, (pte_t *)pmdp, PMD_SIZE)); } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ =20 --=20 2.43.0 From nobody Fri Dec 19 13:50:35 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 50CB12550DA for ; Tue, 22 Apr 2025 08:18:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745309934; cv=none; b=iS+/ljGADzSnKYXmOAIiVmP95rB5bThB9hTME0swNzvKG1NZBzdp0heruWD/cXW+fVX7LSCYdNGtuS3aBRD3o0HjaChQhLlKgLAJJ7/HnyZsHykHA5t1wmCPD3SuoBgF38GgzOp4PQY28cNEN/hltXPa9IqVE+nMmhVHSsUhWF8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745309934; c=relaxed/simple; bh=9tnuHllDz0UN3u08MTXde6SQv886649kDGOS9XSyYH4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=XIvjJTgUpdQGErqpJT6r2MU6XYOSy7593xrrLiOS+b6TVZCvWi6OLY4hQwKaKxyy7+3UHaf83A2HfUn+yqJgAVRkYgQqJl8CfnXWIamjf1tGBiVQDl1g07ac6QOKU5eyhVxYhBsKyFZfi6RATQjrDKLOUaCAJ1Hr71w8+SHyt7c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 73133176A; Tue, 22 Apr 2025 01:18:47 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id D7F853F66E; Tue, 22 Apr 2025 01:18:49 -0700 (PDT) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Pasha Tatashin , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , David Hildenbrand , "Matthew Wilcox (Oracle)" , Mark Rutland , Anshuman Khandual , Alexandre Ghiti , Kevin Brodsky Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v4 05/11] arm64: hugetlb: Use __set_ptes_anysz() and __ptep_get_and_clear_anysz() Date: Tue, 22 Apr 2025 09:18:13 +0100 Message-ID: <20250422081822.1836315-6-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250422081822.1836315-1-ryan.roberts@arm.com> References: <20250422081822.1836315-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Refactor the huge_pte helpers to use the new common __set_ptes_anysz() and __ptep_get_and_clear_anysz() APIs. This provides 2 benefits; First, when page_table_check=3Don, hugetlb is now properly/fully checked. Previously only the first page of a hugetlb folio was checked. Second, instead of having to call __set_ptes(nr=3D1) for each pte in a loop, the whole contiguous batch can now be set in one go, which enables some efficiencies and cleans up the code. One detail to note is that huge_ptep_clear_flush() was previously calling ptep_clear_flush() for a non-contiguous pte (i.e. a pud or pmd block mapping). This has a couple of disadvantages; first ptep_clear_flush() calls ptep_get_and_clear() which transparently handles contpte. Given we only call for non-contiguous ptes, it would be safe, but a waste of effort. It's preferable to go straight to the layer below. However, more problematic is that ptep_get_and_clear() is for PAGE_SIZE entries so it calls page_table_check_pte_clear() and would not clear the whole hugetlb folio. So let's stop special-casing the non-cont case and just rely on get_clear_contig_flush() to do the right thing for non-cont entries. Reviewed-by: Catalin Marinas Signed-off-by: Ryan Roberts Reviewed-by: Anshuman Khandual Tested-by: Luiz Capitulino --- arch/arm64/mm/hugetlbpage.c | 53 +++++++------------------------------ 1 file changed, 10 insertions(+), 43 deletions(-) diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 087fc43381c6..d34703846ef4 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -159,12 +159,11 @@ static pte_t get_clear_contig(struct mm_struct *mm, pte_t pte, tmp_pte; bool present; =20 - pte =3D __ptep_get_and_clear(mm, addr, ptep); + pte =3D __ptep_get_and_clear_anysz(mm, ptep, pgsize); present =3D pte_present(pte); while (--ncontig) { ptep++; - addr +=3D pgsize; - tmp_pte =3D __ptep_get_and_clear(mm, addr, ptep); + tmp_pte =3D __ptep_get_and_clear_anysz(mm, ptep, pgsize); if (present) { if (pte_dirty(tmp_pte)) pte =3D pte_mkdirty(pte); @@ -208,7 +207,7 @@ static void clear_flush(struct mm_struct *mm, unsigned long i, saddr =3D addr; =20 for (i =3D 0; i < ncontig; i++, addr +=3D pgsize, ptep++) - __ptep_get_and_clear(mm, addr, ptep); + __ptep_get_and_clear_anysz(mm, ptep, pgsize); =20 __flush_hugetlb_tlb_range(&vma, saddr, addr, pgsize, true); } @@ -219,32 +218,20 @@ void set_huge_pte_at(struct mm_struct *mm, unsigned l= ong addr, size_t pgsize; int i; int ncontig; - unsigned long pfn, dpfn; - pgprot_t hugeprot; =20 ncontig =3D num_contig_ptes(sz, &pgsize); =20 if (!pte_present(pte)) { for (i =3D 0; i < ncontig; i++, ptep++, addr +=3D pgsize) - __set_ptes(mm, addr, ptep, pte, 1); + __set_ptes_anysz(mm, ptep, pte, 1, pgsize); return; } =20 - if (!pte_cont(pte)) { - __set_ptes(mm, addr, ptep, pte, 1); - return; - } - - pfn =3D pte_pfn(pte); - dpfn =3D pgsize >> PAGE_SHIFT; - hugeprot =3D pte_pgprot(pte); - /* Only need to "break" if transitioning valid -> valid. */ - if (pte_valid(__ptep_get(ptep))) + if (pte_cont(pte) && pte_valid(__ptep_get(ptep))) clear_flush(mm, addr, ptep, pgsize, ncontig); =20 - for (i =3D 0; i < ncontig; i++, ptep++, addr +=3D pgsize, pfn +=3D dpfn) - __set_ptes(mm, addr, ptep, pfn_pte(pfn, hugeprot), 1); + __set_ptes_anysz(mm, ptep, pte, ncontig, pgsize); } =20 pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, @@ -434,11 +421,9 @@ int huge_ptep_set_access_flags(struct vm_area_struct *= vma, unsigned long addr, pte_t *ptep, pte_t pte, int dirty) { - int ncontig, i; + int ncontig; size_t pgsize =3D 0; - unsigned long pfn =3D pte_pfn(pte), dpfn; struct mm_struct *mm =3D vma->vm_mm; - pgprot_t hugeprot; pte_t orig_pte; =20 VM_WARN_ON(!pte_present(pte)); @@ -447,7 +432,6 @@ int huge_ptep_set_access_flags(struct vm_area_struct *v= ma, return __ptep_set_access_flags(vma, addr, ptep, pte, dirty); =20 ncontig =3D num_contig_ptes(huge_page_size(hstate_vma(vma)), &pgsize); - dpfn =3D pgsize >> PAGE_SHIFT; =20 if (!__cont_access_flags_changed(ptep, pte, ncontig)) return 0; @@ -462,19 +446,14 @@ int huge_ptep_set_access_flags(struct vm_area_struct = *vma, if (pte_young(orig_pte)) pte =3D pte_mkyoung(pte); =20 - hugeprot =3D pte_pgprot(pte); - for (i =3D 0; i < ncontig; i++, ptep++, addr +=3D pgsize, pfn +=3D dpfn) - __set_ptes(mm, addr, ptep, pfn_pte(pfn, hugeprot), 1); - + __set_ptes_anysz(mm, ptep, pte, ncontig, pgsize); return 1; } =20 void huge_ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { - unsigned long pfn, dpfn; - pgprot_t hugeprot; - int ncontig, i; + int ncontig; size_t pgsize; pte_t pte; =20 @@ -487,16 +466,11 @@ void huge_ptep_set_wrprotect(struct mm_struct *mm, } =20 ncontig =3D find_num_contig(mm, addr, ptep, &pgsize); - dpfn =3D pgsize >> PAGE_SHIFT; =20 pte =3D get_clear_contig_flush(mm, addr, ptep, pgsize, ncontig); pte =3D pte_wrprotect(pte); =20 - hugeprot =3D pte_pgprot(pte); - pfn =3D pte_pfn(pte); - - for (i =3D 0; i < ncontig; i++, ptep++, addr +=3D pgsize, pfn +=3D dpfn) - __set_ptes(mm, addr, ptep, pfn_pte(pfn, hugeprot), 1); + __set_ptes_anysz(mm, ptep, pte, ncontig, pgsize); } =20 pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, @@ -505,13 +479,6 @@ pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, struct mm_struct *mm =3D vma->vm_mm; size_t pgsize; int ncontig; - pte_t pte; - - pte =3D __ptep_get(ptep); - VM_WARN_ON(!pte_present(pte)); - - if (!pte_cont(pte)) - return ptep_clear_flush(vma, addr, ptep); =20 ncontig =3D num_contig_ptes(huge_page_size(hstate_vma(vma)), &pgsize); return get_clear_contig_flush(mm, addr, ptep, pgsize, ncontig); --=20 2.43.0 From nobody Fri Dec 19 13:50:35 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 86509254AF5 for ; Tue, 22 Apr 2025 08:18:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745309946; cv=none; b=E4f0wkvaOhVN7RH6fUgFhIOa6RtCGMM3MjswWtv9yqKGkQH8KAH+ZnwZG7FM/hjJvjAd/kJURLgHPC2uEUdGRgQ/8rxtvhZ4rC3IgGKJXp8+c6o50+EcvxF0lmnjnCnU+jEXPAvxbgqcF/7aocpbLsQlf0IXRR+zM7jNjjkEb4c= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745309946; c=relaxed/simple; bh=ImP6K2FPR86WHw4a4yR2PdzbZYjhfQRZjB0MdBnLTaA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=LmeiAskfeEgwIZwaz/0A3SvV9npfKrIUeaA/1s61PQdecuvhETxVwFy472ykmiZEwBvZh0FSG8GGjBZm2QcghOnx//G+JgmrHvNnSp7+PkQQTAdlhnBXshy+XDgox01gAU9ZkYA1Dnvm8sj+wyTVc5dduvRuFcnjTSIQ+/ekbXQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 99A3B152B; Tue, 22 Apr 2025 01:18:49 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id E685E3F66E; Tue, 22 Apr 2025 01:18:51 -0700 (PDT) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Pasha Tatashin , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , David Hildenbrand , "Matthew Wilcox (Oracle)" , Mark Rutland , Anshuman Khandual , Alexandre Ghiti , Kevin Brodsky Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v4 06/11] arm64/mm: Hoist barriers out of set_ptes_anysz() loop Date: Tue, 22 Apr 2025 09:18:14 +0100 Message-ID: <20250422081822.1836315-7-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250422081822.1836315-1-ryan.roberts@arm.com> References: <20250422081822.1836315-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" set_ptes_anysz() previously called __set_pte() for each PTE in the range, which would conditionally issue a DSB and ISB to make the new PTE value immediately visible to the table walker if the new PTE was valid and for kernel space. We can do better than this; let's hoist those barriers out of the loop so that they are only issued once at the end of the loop. We then reduce the cost by the number of PTEs in the range. Reviewed-by: Catalin Marinas Reviewed-by: Anshuman Khandual Signed-off-by: Ryan Roberts Tested-by: Luiz Capitulino --- arch/arm64/include/asm/pgtable.h | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgta= ble.h index d80aa9ba0a16..39c331743b69 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -320,13 +320,11 @@ static inline void __set_pte_nosync(pte_t *ptep, pte_= t pte) WRITE_ONCE(*ptep, pte); } =20 -static inline void __set_pte(pte_t *ptep, pte_t pte) +static inline void __set_pte_complete(pte_t pte) { - __set_pte_nosync(ptep, pte); - /* * Only if the new pte is valid and kernel, otherwise TLB maintenance - * or update_mmu_cache() have the necessary barriers. + * has the necessary barriers. */ if (pte_valid_not_user(pte)) { dsb(ishst); @@ -334,6 +332,12 @@ static inline void __set_pte(pte_t *ptep, pte_t pte) } } =20 +static inline void __set_pte(pte_t *ptep, pte_t pte) +{ + __set_pte_nosync(ptep, pte); + __set_pte_complete(pte); +} + static inline pte_t __ptep_get(pte_t *ptep) { return READ_ONCE(*ptep); @@ -658,12 +662,14 @@ static inline void __set_ptes_anysz(struct mm_struct = *mm, pte_t *ptep, =20 for (;;) { __check_safe_pte_update(mm, ptep, pte); - __set_pte(ptep, pte); + __set_pte_nosync(ptep, pte); if (--nr =3D=3D 0) break; ptep++; pte =3D pte_advance_pfn(pte, stride); } + + __set_pte_complete(pte); } =20 static inline void __set_ptes(struct mm_struct *mm, --=20 2.43.0 From nobody Fri Dec 19 13:50:35 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 549CA27463 for ; Tue, 22 Apr 2025 08:18:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745309937; cv=none; b=M3LhoiLufINoNPd+rY1mO2sI7Vjwhfih+CwhQHVtnHymeDkwO51OsoTqHsNP+bq/JVl0ybaJdqAnBDvtCvujSOpQiQqKbNaKxc56ZGjdJlXIy3j2xl9WBkJZymceT+GmVAjGejpAbk/cnmG737PgIZuxX0f/Oul0icxA/gdgP7k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745309937; c=relaxed/simple; bh=/SFsxDltu5ntJmXt+KaGKCcd7wB4xL2c7ePLhjMCpEk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=LIiYIaaeIXFY0jwbVQosozdlZ4TBlvqAoIoQWQf2o5aP8tk6zI3HsYA38xid7iqYWD0pm7l3WC/N5+FlREYdQIbzcDlQJuaNcocXXhEiU3FfLyGp48ighy2w3h/ygNeB8HKBiMp/0QrQl/NTRae7uPS4bzfBZKryaLhbrUSCAmc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id BF62D176A; Tue, 22 Apr 2025 01:18:51 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 184D83F66E; Tue, 22 Apr 2025 01:18:53 -0700 (PDT) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Pasha Tatashin , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , David Hildenbrand , "Matthew Wilcox (Oracle)" , Mark Rutland , Anshuman Khandual , Alexandre Ghiti , Kevin Brodsky Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v4 07/11] mm/vmalloc: Warn on improper use of vunmap_range() Date: Tue, 22 Apr 2025 09:18:15 +0100 Message-ID: <20250422081822.1836315-8-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250422081822.1836315-1-ryan.roberts@arm.com> References: <20250422081822.1836315-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" A call to vmalloc_huge() may cause memory blocks to be mapped at pmd or pud level. But it is possible to subsequently call vunmap_range() on a sub-range of the mapped memory, which partially overlaps a pmd or pud. In this case, vmalloc unmaps the entire pmd or pud so that the no-overlapping portion is also unmapped. Clearly that would have a bad outcome, but it's not something that any callers do today as far as I can tell. So I guess it's just expected that callers will not do this. However, it would be useful to know if this happened in future; let's add a warning to cover the eventuality. Reviewed-by: Anshuman Khandual Reviewed-by: Catalin Marinas Reviewed-by: Uladzislau Rezki (Sony) Signed-off-by: Ryan Roberts Tested-by: Luiz Capitulino --- mm/vmalloc.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 3ed720a787ec..d60d3a29d149 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -374,8 +374,10 @@ static void vunmap_pmd_range(pud_t *pud, unsigned long= addr, unsigned long end, if (cleared || pmd_bad(*pmd)) *mask |=3D PGTBL_PMD_MODIFIED; =20 - if (cleared) + if (cleared) { + WARN_ON(next - addr < PMD_SIZE); continue; + } if (pmd_none_or_clear_bad(pmd)) continue; vunmap_pte_range(pmd, addr, next, mask); @@ -399,8 +401,10 @@ static void vunmap_pud_range(p4d_t *p4d, unsigned long= addr, unsigned long end, if (cleared || pud_bad(*pud)) *mask |=3D PGTBL_PUD_MODIFIED; =20 - if (cleared) + if (cleared) { + WARN_ON(next - addr < PUD_SIZE); continue; + } if (pud_none_or_clear_bad(pud)) continue; vunmap_pmd_range(pud, addr, next, mask); --=20 2.43.0 From nobody Fri Dec 19 13:50:35 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 61CA42566F7 for ; Tue, 22 Apr 2025 08:18:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745309940; cv=none; b=BxG1YSLbH3ATXnAtdH2GqHZY0bjkiMKr1DqZzG7nnRmWux5TF9PxARRPSG54Gg5AAzpBeXfLfRKz5jkcyDWNx+CXUA09b+CVKLRVPmUrTZwavL2GyazBHNxLiBHF+e6eRTIbBRhn/ylrANYPkvLbA7zFlK/TMPZGw7ahu1M653A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745309940; c=relaxed/simple; bh=wa+lNOjyuh0rEn3s9jZtbk0yhidJ6K3BufxfXoYMM/A=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ok1mRk1LHSf2ldCHYflwf6AQZsR4ygOBkdaJbHuUb0BZG3396g6BJPrr0KasRXGaNTeWzFXIF+jTIExeOQAtSa1piAuN4+NEeI9GZuaEgawmZ8vnhq+sZTPb2PD3SbZlUO9DgZ1J530Dbja0SCX45dL7Gui6JpYKnd8EHs/AxRI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id E5AD319F0; Tue, 22 Apr 2025 01:18:53 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 3E4B73F66E; Tue, 22 Apr 2025 01:18:56 -0700 (PDT) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Pasha Tatashin , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , David Hildenbrand , "Matthew Wilcox (Oracle)" , Mark Rutland , Anshuman Khandual , Alexandre Ghiti , Kevin Brodsky Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v4 08/11] mm/vmalloc: Gracefully unmap huge ptes Date: Tue, 22 Apr 2025 09:18:16 +0100 Message-ID: <20250422081822.1836315-9-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250422081822.1836315-1-ryan.roberts@arm.com> References: <20250422081822.1836315-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Commit f7ee1f13d606 ("mm/vmalloc: enable mapping of huge pages at pte level in vmap") added its support by reusing the set_huge_pte_at() API, which is otherwise only used for user mappings. But when unmapping those huge ptes, it continued to call ptep_get_and_clear(), which is a layering violation. To date, the only arch to implement this support is powerpc and it all happens to work ok for it. But arm64's implementation of ptep_get_and_clear() can not be safely used to clear a previous set_huge_pte_at(). So let's introduce a new arch opt-in function, arch_vmap_pte_range_unmap_size(), which can provide the size of a (present) pte. Then we can call huge_ptep_get_and_clear() to tear it down properly. Note that if vunmap_range() is called with a range that starts in the middle of a huge pte-mapped page, we must unmap the entire huge page so the behaviour is consistent with pmd and pud block mappings. In this case emit a warning just like we do for pmd/pud mappings. Reviewed-by: Anshuman Khandual Reviewed-by: Uladzislau Rezki (Sony) Reviewed-by: Catalin Marinas Signed-off-by: Ryan Roberts Tested-by: Luiz Capitulino --- include/linux/vmalloc.h | 8 ++++++++ mm/vmalloc.c | 18 ++++++++++++++++-- 2 files changed, 24 insertions(+), 2 deletions(-) diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 31e9ffd936e3..16dd4cba64f2 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -113,6 +113,14 @@ static inline unsigned long arch_vmap_pte_range_map_si= ze(unsigned long addr, uns } #endif =20 +#ifndef arch_vmap_pte_range_unmap_size +static inline unsigned long arch_vmap_pte_range_unmap_size(unsigned long a= ddr, + pte_t *ptep) +{ + return PAGE_SIZE; +} +#endif + #ifndef arch_vmap_pte_supported_shift static inline int arch_vmap_pte_supported_shift(unsigned long size) { diff --git a/mm/vmalloc.c b/mm/vmalloc.c index d60d3a29d149..fe2e2cc8da94 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -350,12 +350,26 @@ static void vunmap_pte_range(pmd_t *pmd, unsigned lon= g addr, unsigned long end, pgtbl_mod_mask *mask) { pte_t *pte; + pte_t ptent; + unsigned long size =3D PAGE_SIZE; =20 pte =3D pte_offset_kernel(pmd, addr); do { - pte_t ptent =3D ptep_get_and_clear(&init_mm, addr, pte); +#ifdef CONFIG_HUGETLB_PAGE + size =3D arch_vmap_pte_range_unmap_size(addr, pte); + if (size !=3D PAGE_SIZE) { + if (WARN_ON(!IS_ALIGNED(addr, size))) { + addr =3D ALIGN_DOWN(addr, size); + pte =3D PTR_ALIGN_DOWN(pte, sizeof(*pte) * (size >> PAGE_SHIFT)); + } + ptent =3D huge_ptep_get_and_clear(&init_mm, addr, pte, size); + if (WARN_ON(end - addr < size)) + size =3D end - addr; + } else +#endif + ptent =3D ptep_get_and_clear(&init_mm, addr, pte); WARN_ON(!pte_none(ptent) && !pte_present(ptent)); - } while (pte++, addr +=3D PAGE_SIZE, addr !=3D end); + } while (pte +=3D (size >> PAGE_SHIFT), addr +=3D size, addr !=3D end); *mask |=3D PGTBL_PTE_MODIFIED; } =20 --=20 2.43.0 From nobody Fri Dec 19 13:50:35 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 169BD256C86 for ; Tue, 22 Apr 2025 08:19:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745309942; cv=none; b=gQAhA1TKm1dC9ZTxxh7JYBuDKJNVpFvulvjia214ZsL00D4IyZ2U2XWd7pT535cSjAqiDHviToK4z5+RwYRTv52a0wM6UF76+egkaRbguSfIhpgwUG7taDmVSmbDEbQy12OtizdkjU/IONiObUHncntrQvaXNEqZP7YEbKa9Om8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745309942; c=relaxed/simple; bh=sJqLePHA9b3ixpCb+unmyiOKQtG+Q3r/RUEOnwQgkEQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=B0Ij+tn3j61Ukk8q4c9j2DgfnQXvRxyV0opeqeQBsZ8jtf7XiP8HqS9nNGyJ+Tu/9yQsGQ2/+1SoE6mCH2KWMT4LjHaMHwkRiclHOsCPKyewO4cw0GP44ZE3cieAmpQoqux0tZ+ECbsDK2cfu0vffuv8YlaRT/4vDycdFgI6q7E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 18458176A; Tue, 22 Apr 2025 01:18:56 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 64C2F3F66E; Tue, 22 Apr 2025 01:18:58 -0700 (PDT) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Pasha Tatashin , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , David Hildenbrand , "Matthew Wilcox (Oracle)" , Mark Rutland , Anshuman Khandual , Alexandre Ghiti , Kevin Brodsky Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v4 09/11] arm64/mm: Support huge pte-mapped pages in vmap Date: Tue, 22 Apr 2025 09:18:17 +0100 Message-ID: <20250422081822.1836315-10-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250422081822.1836315-1-ryan.roberts@arm.com> References: <20250422081822.1836315-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Implement the required arch functions to enable use of contpte in the vmap when VM_ALLOW_HUGE_VMAP is specified. This speeds up vmap operations due to only having to issue a DSB and ISB per contpte block instead of per pte. But it also means that the TLB pressure reduces due to only needing a single TLB entry for the whole contpte block. Since vmap uses set_huge_pte_at() to set the contpte, that API is now used for kernel mappings for the first time. Although in the vmap case we never expect it to be called to modify a valid mapping so clear_flush() should never be called, it's still wise to make it robust for the kernel case, so amend the tlb flush function if the mm is for kernel space. Tested with vmalloc performance selftests: # kself/mm/test_vmalloc.sh \ run_test_mask=3D1 test_repeat_count=3D5 nr_pages=3D256 test_loop_count=3D100000 use_huge=3D1 Duration reduced from 1274243 usec to 1083553 usec on Apple M2 for 15% reduction in time taken. Reviewed-by: Anshuman Khandual Reviewed-by: Catalin Marinas Signed-off-by: Ryan Roberts Tested-by: Luiz Capitulino --- arch/arm64/include/asm/vmalloc.h | 45 ++++++++++++++++++++++++++++++++ arch/arm64/mm/hugetlbpage.c | 5 +++- 2 files changed, 49 insertions(+), 1 deletion(-) diff --git a/arch/arm64/include/asm/vmalloc.h b/arch/arm64/include/asm/vmal= loc.h index 38fafffe699f..12f534e8f3ed 100644 --- a/arch/arm64/include/asm/vmalloc.h +++ b/arch/arm64/include/asm/vmalloc.h @@ -23,6 +23,51 @@ static inline bool arch_vmap_pmd_supported(pgprot_t prot) return !IS_ENABLED(CONFIG_PTDUMP_DEBUGFS); } =20 +#define arch_vmap_pte_range_map_size arch_vmap_pte_range_map_size +static inline unsigned long arch_vmap_pte_range_map_size(unsigned long add= r, + unsigned long end, u64 pfn, + unsigned int max_page_shift) +{ + /* + * If the block is at least CONT_PTE_SIZE in size, and is naturally + * aligned in both virtual and physical space, then we can pte-map the + * block using the PTE_CONT bit for more efficient use of the TLB. + */ + if (max_page_shift < CONT_PTE_SHIFT) + return PAGE_SIZE; + + if (end - addr < CONT_PTE_SIZE) + return PAGE_SIZE; + + if (!IS_ALIGNED(addr, CONT_PTE_SIZE)) + return PAGE_SIZE; + + if (!IS_ALIGNED(PFN_PHYS(pfn), CONT_PTE_SIZE)) + return PAGE_SIZE; + + return CONT_PTE_SIZE; +} + +#define arch_vmap_pte_range_unmap_size arch_vmap_pte_range_unmap_size +static inline unsigned long arch_vmap_pte_range_unmap_size(unsigned long a= ddr, + pte_t *ptep) +{ + /* + * The caller handles alignment so it's sufficient just to check + * PTE_CONT. + */ + return pte_valid_cont(__ptep_get(ptep)) ? CONT_PTE_SIZE : PAGE_SIZE; +} + +#define arch_vmap_pte_supported_shift arch_vmap_pte_supported_shift +static inline int arch_vmap_pte_supported_shift(unsigned long size) +{ + if (size >=3D CONT_PTE_SIZE) + return CONT_PTE_SHIFT; + + return PAGE_SHIFT; +} + #endif =20 #define arch_vmap_pgprot_tagged arch_vmap_pgprot_tagged diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index d34703846ef4..0c8737f4f2ce 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -209,7 +209,10 @@ static void clear_flush(struct mm_struct *mm, for (i =3D 0; i < ncontig; i++, addr +=3D pgsize, ptep++) __ptep_get_and_clear_anysz(mm, ptep, pgsize); =20 - __flush_hugetlb_tlb_range(&vma, saddr, addr, pgsize, true); + if (mm =3D=3D &init_mm) + flush_tlb_kernel_range(saddr, addr); + else + __flush_hugetlb_tlb_range(&vma, saddr, addr, pgsize, true); } =20 void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, --=20 2.43.0 From nobody Fri Dec 19 13:50:35 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 1AE272571BC for ; Tue, 22 Apr 2025 08:19:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745309944; cv=none; b=VkLpzX2V1Wuz5NhOnJKB+qLg0Grr7n5lGvEcidqx8n/ff9QX/o2XoKZQAR7065Z9y/V8bVKsI8ZTLVgW2sjxs3zIXOJF8UKEDvdUpb0BDFfsQxkuQpsI1uodmeWREnr5mo+ERHpqbFGBZR+C8+B841ELW3Gpd/oYLlorBNwsXVQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745309944; c=relaxed/simple; bh=+p/mnV2DQV1p9Z1OJQB+OJfrf5JgQAwxSMZ6nSzRZNs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=t1MsV5kUGriTts3F1FsR+F4480h+hV7eKakepg91P3M8tSTfSBdXlHUqY9ehTfpTH1IaHY1rY92fgNMCPcGt+kTT/x1RiEy5lbHLLkjihUQo15XIxzUeZoqx8BZPWVepemUPqghNDhQ8L3PHcuYRW2p58YeqyeBaKdf3ivY+iFI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 436561EDB; Tue, 22 Apr 2025 01:18:58 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 8BBBF3F66E; Tue, 22 Apr 2025 01:19:00 -0700 (PDT) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Pasha Tatashin , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , David Hildenbrand , "Matthew Wilcox (Oracle)" , Mark Rutland , Anshuman Khandual , Alexandre Ghiti , Kevin Brodsky Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v4 10/11] mm/vmalloc: Enter lazy mmu mode while manipulating vmalloc ptes Date: Tue, 22 Apr 2025 09:18:18 +0100 Message-ID: <20250422081822.1836315-11-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250422081822.1836315-1-ryan.roberts@arm.com> References: <20250422081822.1836315-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Wrap vmalloc's pte table manipulation loops with arch_enter_lazy_mmu_mode() / arch_leave_lazy_mmu_mode(). This provides the arch code with the opportunity to optimize the pte manipulations. Note that vmap_pfn() already uses lazy mmu mode since it delegates to apply_to_page_range() which enters lazy mmu mode for both user and kernel mappings. These hooks will shortly be used by arm64 to improve vmalloc performance. Reviewed-by: Uladzislau Rezki (Sony) Reviewed-by: Catalin Marinas Reviewed-by: Anshuman Khandual Signed-off-by: Ryan Roberts Tested-by: Luiz Capitulino --- mm/vmalloc.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index fe2e2cc8da94..24430160b37f 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -104,6 +104,9 @@ static int vmap_pte_range(pmd_t *pmd, unsigned long add= r, unsigned long end, pte =3D pte_alloc_kernel_track(pmd, addr, mask); if (!pte) return -ENOMEM; + + arch_enter_lazy_mmu_mode(); + do { if (unlikely(!pte_none(ptep_get(pte)))) { if (pfn_valid(pfn)) { @@ -127,6 +130,8 @@ static int vmap_pte_range(pmd_t *pmd, unsigned long add= r, unsigned long end, set_pte_at(&init_mm, addr, pte, pfn_pte(pfn, prot)); pfn++; } while (pte +=3D PFN_DOWN(size), addr +=3D size, addr !=3D end); + + arch_leave_lazy_mmu_mode(); *mask |=3D PGTBL_PTE_MODIFIED; return 0; } @@ -354,6 +359,8 @@ static void vunmap_pte_range(pmd_t *pmd, unsigned long = addr, unsigned long end, unsigned long size =3D PAGE_SIZE; =20 pte =3D pte_offset_kernel(pmd, addr); + arch_enter_lazy_mmu_mode(); + do { #ifdef CONFIG_HUGETLB_PAGE size =3D arch_vmap_pte_range_unmap_size(addr, pte); @@ -370,6 +377,8 @@ static void vunmap_pte_range(pmd_t *pmd, unsigned long = addr, unsigned long end, ptent =3D ptep_get_and_clear(&init_mm, addr, pte); WARN_ON(!pte_none(ptent) && !pte_present(ptent)); } while (pte +=3D (size >> PAGE_SHIFT), addr +=3D size, addr !=3D end); + + arch_leave_lazy_mmu_mode(); *mask |=3D PGTBL_PTE_MODIFIED; } =20 @@ -515,6 +524,9 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned lo= ng addr, pte =3D pte_alloc_kernel_track(pmd, addr, mask); if (!pte) return -ENOMEM; + + arch_enter_lazy_mmu_mode(); + do { struct page *page =3D pages[*nr]; =20 @@ -528,6 +540,8 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned lo= ng addr, set_pte_at(&init_mm, addr, pte, mk_pte(page, prot)); (*nr)++; } while (pte++, addr +=3D PAGE_SIZE, addr !=3D end); + + arch_leave_lazy_mmu_mode(); *mask |=3D PGTBL_PTE_MODIFIED; return 0; } --=20 2.43.0 From nobody Fri Dec 19 13:50:35 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id D16F225745C for ; Tue, 22 Apr 2025 08:19:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745309946; cv=none; b=hg2alqirEQIrxF035v2V0IZSIsF8rniB9foMIfaE85uobmfUTggnQS4RlWtcVVIaL8eJH0mJXzKpG/mG3ci2rQkgcPxYv0MO4SVH68CdZuj6EStV5EevMwdOnmTIUWuNL5w7MvgxB/sCi+vivNjXaXsuEkHDeVy0OWC0GKWp+RM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1745309946; c=relaxed/simple; bh=30csncMpE5+sTYLNxykhgWXsUjxdlDtvq0BTVZ7h+d0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ri/S+9f0wOqiZ88WiTrnaI8p66BMF3S2sgD/RuSAaQU5i446BZO9nY9yOqYXDL9ui7jltGcu53VgNKYKLD/Tb4CYckpvyFZ7TUE2iiiy71HgxBvCeyNP4D7by0sxY7eTIR8NaW9wce2sJ5FjQGFwYTEqIlZDPP1yOgLEyfqTRS4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6E095176A; Tue, 22 Apr 2025 01:19:00 -0700 (PDT) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.27]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B57543F66E; Tue, 22 Apr 2025 01:19:02 -0700 (PDT) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Pasha Tatashin , Andrew Morton , Uladzislau Rezki , Christoph Hellwig , David Hildenbrand , "Matthew Wilcox (Oracle)" , Mark Rutland , Anshuman Khandual , Alexandre Ghiti , Kevin Brodsky Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v4 11/11] arm64/mm: Batch barriers when updating kernel mappings Date: Tue, 22 Apr 2025 09:18:19 +0100 Message-ID: <20250422081822.1836315-12-ryan.roberts@arm.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20250422081822.1836315-1-ryan.roberts@arm.com> References: <20250422081822.1836315-1-ryan.roberts@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Because the kernel can't tolerate page faults for kernel mappings, when setting a valid, kernel space pte (or pmd/pud/p4d/pgd), it emits a dsb(ishst) to ensure that the store to the pgtable is observed by the table walker immediately. Additionally it emits an isb() to ensure that any already speculatively determined invalid mapping fault gets canceled. We can improve the performance of vmalloc operations by batching these barriers until the end of a set of entry updates. arch_enter_lazy_mmu_mode() and arch_leave_lazy_mmu_mode() provide the required hooks. vmalloc improves by up to 30% as a result. Two new TIF_ flags are created; TIF_LAZY_MMU tells us if the task is in the lazy mode and can therefore defer any barriers until exit from the lazy mode. TIF_LAZY_MMU_PENDING is used to remember if any pte operation was performed while in the lazy mode that required barriers. Then when leaving lazy mode, if that flag is set, we emit the barriers. Since arch_enter_lazy_mmu_mode() and arch_leave_lazy_mmu_mode() are used for both user and kernel mappings, we need the second flag to avoid emitting barriers unnecessarily if only user mappings were updated. Reviewed-by: Catalin Marinas Signed-off-by: Ryan Roberts Tested-by: Luiz Capitulino --- arch/arm64/include/asm/pgtable.h | 81 ++++++++++++++++++++++------ arch/arm64/include/asm/thread_info.h | 2 + arch/arm64/kernel/process.c | 9 ++-- 3 files changed, 72 insertions(+), 20 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgta= ble.h index 39c331743b69..ab4a1b19e596 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -40,6 +40,63 @@ #include #include =20 +static inline void emit_pte_barriers(void) +{ + /* + * These barriers are emitted under certain conditions after a pte entry + * was modified (see e.g. __set_pte_complete()). The dsb makes the store + * visible to the table walker. The isb ensures that any previous + * speculative "invalid translation" marker that is in the CPU's + * pipeline gets cleared, so that any access to that address after + * setting the pte to valid won't cause a spurious fault. If the thread + * gets preempted after storing to the pgtable but before emitting these + * barriers, __switch_to() emits a dsb which ensure the walker gets to + * see the store. There is no guarantee of an isb being issued though. + * This is safe because it will still get issued (albeit on a + * potentially different CPU) when the thread starts running again, + * before any access to the address. + */ + dsb(ishst); + isb(); +} + +static inline void queue_pte_barriers(void) +{ + unsigned long flags; + + VM_WARN_ON(in_interrupt()); + flags =3D read_thread_flags(); + + if (flags & BIT(TIF_LAZY_MMU)) { + /* Avoid the atomic op if already set. */ + if (!(flags & BIT(TIF_LAZY_MMU_PENDING))) + set_thread_flag(TIF_LAZY_MMU_PENDING); + } else { + emit_pte_barriers(); + } +} + +#define __HAVE_ARCH_ENTER_LAZY_MMU_MODE +static inline void arch_enter_lazy_mmu_mode(void) +{ + VM_WARN_ON(in_interrupt()); + VM_WARN_ON(test_thread_flag(TIF_LAZY_MMU)); + + set_thread_flag(TIF_LAZY_MMU); +} + +static inline void arch_flush_lazy_mmu_mode(void) +{ + if (test_and_clear_thread_flag(TIF_LAZY_MMU_PENDING)) + emit_pte_barriers(); +} + +static inline void arch_leave_lazy_mmu_mode(void) +{ + arch_flush_lazy_mmu_mode(); + clear_thread_flag(TIF_LAZY_MMU); +} + #ifdef CONFIG_TRANSPARENT_HUGEPAGE #define __HAVE_ARCH_FLUSH_PMD_TLB_RANGE =20 @@ -326,10 +383,8 @@ static inline void __set_pte_complete(pte_t pte) * Only if the new pte is valid and kernel, otherwise TLB maintenance * has the necessary barriers. */ - if (pte_valid_not_user(pte)) { - dsb(ishst); - isb(); - } + if (pte_valid_not_user(pte)) + queue_pte_barriers(); } =20 static inline void __set_pte(pte_t *ptep, pte_t pte) @@ -801,10 +856,8 @@ static inline void set_pmd(pmd_t *pmdp, pmd_t pmd) =20 WRITE_ONCE(*pmdp, pmd); =20 - if (pmd_valid(pmd)) { - dsb(ishst); - isb(); - } + if (pmd_valid(pmd)) + queue_pte_barriers(); } =20 static inline void pmd_clear(pmd_t *pmdp) @@ -869,10 +922,8 @@ static inline void set_pud(pud_t *pudp, pud_t pud) =20 WRITE_ONCE(*pudp, pud); =20 - if (pud_valid(pud)) { - dsb(ishst); - isb(); - } + if (pud_valid(pud)) + queue_pte_barriers(); } =20 static inline void pud_clear(pud_t *pudp) @@ -951,8 +1002,7 @@ static inline void set_p4d(p4d_t *p4dp, p4d_t p4d) } =20 WRITE_ONCE(*p4dp, p4d); - dsb(ishst); - isb(); + queue_pte_barriers(); } =20 static inline void p4d_clear(p4d_t *p4dp) @@ -1080,8 +1130,7 @@ static inline void set_pgd(pgd_t *pgdp, pgd_t pgd) } =20 WRITE_ONCE(*pgdp, pgd); - dsb(ishst); - isb(); + queue_pte_barriers(); } =20 static inline void pgd_clear(pgd_t *pgdp) diff --git a/arch/arm64/include/asm/thread_info.h b/arch/arm64/include/asm/= thread_info.h index 1114c1c3300a..1fdd74b7b831 100644 --- a/arch/arm64/include/asm/thread_info.h +++ b/arch/arm64/include/asm/thread_info.h @@ -82,6 +82,8 @@ void arch_setup_new_exec(void); #define TIF_SME_VL_INHERIT 28 /* Inherit SME vl_onexec across exec */ #define TIF_KERNEL_FPSTATE 29 /* Task is in a kernel mode FPSIMD section */ #define TIF_TSC_SIGSEGV 30 /* SIGSEGV on counter-timer access */ +#define TIF_LAZY_MMU 31 /* Task in lazy mmu mode */ +#define TIF_LAZY_MMU_PENDING 32 /* Ops pending for lazy mmu mode exit */ =20 #define _TIF_SIGPENDING (1 << TIF_SIGPENDING) #define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED) diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c index 42faebb7b712..45a55fe81788 100644 --- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -680,10 +680,11 @@ struct task_struct *__switch_to(struct task_struct *p= rev, gcs_thread_switch(next); =20 /* - * Complete any pending TLB or cache maintenance on this CPU in case - * the thread migrates to a different CPU. - * This full barrier is also required by the membarrier system - * call. + * Complete any pending TLB or cache maintenance on this CPU in case the + * thread migrates to a different CPU. This full barrier is also + * required by the membarrier system call. Additionally it makes any + * in-progress pgtable writes visible to the table walker; See + * emit_pte_barriers(). */ dsb(ish); =20 --=20 2.43.0