From nobody Mon Oct 6 06:36:42 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 4838F20296C for ; Thu, 24 Jul 2025 05:23:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753334597; cv=none; b=SWel6mjwP7DrdwNCbwRtWQqMtudV3Jx91fFKx91Lt2i2rYYQsiYNqay1xV+YA4A1DrYoglKbe15wMA3aNMcWgo7sNPCYn/o2hyFlGSCT3qdj3ApPE53GMpxwCn+6J/qebaNHa4Q6yDJiGmR0GVsTw3PuQuiJ4dXKntppWK3gzK0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753334597; c=relaxed/simple; bh=O+wGFrs8njA14m5dllNKrj4nzx1Qzamm4wQY+p8+QHA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=mAyrDcRCO2MLxPHD/Sr7wNG0FKAhpS6eFFG3EhqiWOQNKCLRX+2vuNb+LO2dVBh3RrPtXGgPYM+Qoy2KOUuoRur01NAREav2PRQllIUJX4ZZiNzsufweruxIFMqJ3ro6dAdmDkOdYuysIxyOOXyN3ycWgW++mp3FYVFoRdQFLfk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 6BDCD1CC4; Wed, 23 Jul 2025 22:23:08 -0700 (PDT) Received: from MacBook-Pro.blr.arm.com (unknown [10.164.18.48]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 156693F66E; Wed, 23 Jul 2025 22:23:10 -0700 (PDT) From: Dev Jain To: akpm@linux-foundation.org, david@redhat.com Cc: ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, baohua@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Dev Jain Subject: [PATCH v4 1/3] mm: add get_and_clear_ptes() and clear_ptes() Date: Thu, 24 Jul 2025 10:52:59 +0530 Message-Id: <20250724052301.23844-2-dev.jain@arm.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250724052301.23844-1-dev.jain@arm.com> References: <20250724052301.23844-1-dev.jain@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: David Hildenbrand Let's add variants to be used where "full" does not apply -- which will be the majority of cases in the future. "full" really only applies if we are about to tear down a full MM. Use get_and_clear_ptes() in existing code, clear_ptes() users will be added next. Reviewed-by: Baolin Wang Signed-off-by: David Hildenbrand Signed-off-by: Dev Jain Reviewed-by: Barry Song Reviewed-by: Lorenzo Stoakes Reviewed-by: Zi Yan --- arch/arm64/mm/mmu.c | 2 +- include/linux/pgtable.h | 45 +++++++++++++++++++++++++++++++++++++++++ mm/mremap.c | 2 +- mm/rmap.c | 2 +- 4 files changed, 48 insertions(+), 3 deletions(-) diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index abd9725796e9..20a89ab97dc5 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -1528,7 +1528,7 @@ early_initcall(prevent_bootmem_remove_init); pte_t modify_prot_start_ptes(struct vm_area_struct *vma, unsigned long add= r, pte_t *ptep, unsigned int nr) { - pte_t pte =3D get_and_clear_full_ptes(vma->vm_mm, addr, ptep, nr, /* full= =3D */ 0); + pte_t pte =3D get_and_clear_ptes(vma->vm_mm, addr, ptep, nr); =20 if (alternative_has_cap_unlikely(ARM64_WORKAROUND_2645198)) { /* diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index e3b99920be05..4c035637eeb7 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -736,6 +736,29 @@ static inline pte_t get_and_clear_full_ptes(struct mm_= struct *mm, } #endif =20 +/** + * get_and_clear_ptes - Clear present PTEs that map consecutive pages of + * the same folio, collecting dirty/accessed bits. + * @mm: Address space the pages are mapped into. + * @addr: Address the first page is mapped at. + * @ptep: Page table pointer for the first entry. + * @nr: Number of entries to clear. + * + * Use this instead of get_and_clear_full_ptes() if it is known that we do= n't + * need to clear the full mm, which is mostly the case. + * + * Note that PTE bits in the PTE range besides the PFN can differ. For exa= mple, + * some PTEs might be write-protected. + * + * Context: The caller holds the page table lock. The PTEs map consecutive + * pages that belong to the same folio. The PTEs are all in the same PMD. + */ +static inline pte_t get_and_clear_ptes(struct mm_struct *mm, unsigned long= addr, + pte_t *ptep, unsigned int nr) +{ + return get_and_clear_full_ptes(mm, addr, ptep, nr, 0); +} + #ifndef clear_full_ptes /** * clear_full_ptes - Clear present PTEs that map consecutive pages of the = same @@ -768,6 +791,28 @@ static inline void clear_full_ptes(struct mm_struct *m= m, unsigned long addr, } #endif =20 +/** + * clear_ptes - Clear present PTEs that map consecutive pages of the same = folio. + * @mm: Address space the pages are mapped into. + * @addr: Address the first page is mapped at. + * @ptep: Page table pointer for the first entry. + * @nr: Number of entries to clear. + * + * Use this instead of clear_full_ptes() if it is known that we don't need= to + * clear the full mm, which is mostly the case. + * + * Note that PTE bits in the PTE range besides the PFN can differ. For exa= mple, + * some PTEs might be write-protected. + * + * Context: The caller holds the page table lock. The PTEs map consecutive + * pages that belong to the same folio. The PTEs are all in the same PMD. + */ +static inline void clear_ptes(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, unsigned int nr) +{ + clear_full_ptes(mm, addr, ptep, nr, 0); +} + /* * If two threads concurrently fault at the same page, the thread that * won the race updates the PTE and its local TLB/Cache. The other thread diff --git a/mm/mremap.c b/mm/mremap.c index ac39845e9718..677a4d744df9 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -280,7 +280,7 @@ static int move_ptes(struct pagetable_move_control *pmc, old_pte, max_nr_ptes); force_flush =3D true; } - pte =3D get_and_clear_full_ptes(mm, old_addr, old_ptep, nr_ptes, 0); + pte =3D get_and_clear_ptes(mm, old_addr, old_ptep, nr_ptes); pte =3D move_pte(pte, old_addr, new_addr); pte =3D move_soft_dirty_pte(pte); =20 diff --git a/mm/rmap.c b/mm/rmap.c index f93ce27132ab..568198e9efc2 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -2036,7 +2036,7 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, flush_cache_range(vma, address, end_addr); =20 /* Nuke the page table entry. */ - pteval =3D get_and_clear_full_ptes(mm, address, pvmw.pte, nr_pages, 0); + pteval =3D get_and_clear_ptes(mm, address, pvmw.pte, nr_pages); /* * We clear the PTE but do not flush so potentially * a remote CPU could still be writing to the folio. --=20 2.30.2 From nobody Mon Oct 6 06:36:42 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 3E5151F5858 for ; Thu, 24 Jul 2025 05:23:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753334600; cv=none; b=b5Ulp36mTI1dfxUdgeR7PTwjm4jtdCasoxyAwmtVfD1VfkASuCVCFpJvrFzLTL+M6AJD+9+jeFqMP9B2y0ztAGwtQjsGRZ7hEZf3uCZrwG2usEBV/yYAN+5ybuPl0n+dwdMXJ9PWRQTGwWv5uKvQ3nlRSJG3s+nbYJK1VdxJT6U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753334600; c=relaxed/simple; bh=2+mp5bwXBA8QFK86urm4iyXpcpmBsszoaOcFKK7qRn4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=s99ux6nvZtIQ47Ym0ySwmHxOw5dk6RQ/zoyQfHZLZQg95LXhk034kRqiM9v2xSUL/4kpRYlnkOn+/3OjW+Q0FfitxvODJgVuF+QltY1YeHDm3VVSZdmrlKGEi6YFPx0JvIIjex8Fmf0WszwcZBvle9J+7d4X+1ojRLE+sUi8+kI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 91A411A32; Wed, 23 Jul 2025 22:23:12 -0700 (PDT) Received: from MacBook-Pro.blr.arm.com (unknown [10.164.18.48]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 3969C3F66E; Wed, 23 Jul 2025 22:23:14 -0700 (PDT) From: Dev Jain To: akpm@linux-foundation.org, david@redhat.com Cc: ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, baohua@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Dev Jain Subject: [PATCH v4 2/3] khugepaged: Optimize __collapse_huge_page_copy_succeeded() by PTE batching Date: Thu, 24 Jul 2025 10:53:00 +0530 Message-Id: <20250724052301.23844-3-dev.jain@arm.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250724052301.23844-1-dev.jain@arm.com> References: <20250724052301.23844-1-dev.jain@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Use PTE batching to batch process PTEs mapping the same large folio. An improvement is expected due to batching refcount-mapcount manipulation on the folios, and for arm64 which supports contig mappings, the number of TLB flushes is also reduced. Acked-by: David Hildenbrand Reviewed-by: Baolin Wang Signed-off-by: Dev Jain Reviewed-by: Lorenzo Stoakes Reviewed-by: Zi Yan --- mm/khugepaged.c | 25 ++++++++++++++++++------- 1 file changed, 18 insertions(+), 7 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index a55fb1dcd224..f23e943506bc 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -700,12 +700,15 @@ static void __collapse_huge_page_copy_succeeded(pte_t= *pte, spinlock_t *ptl, struct list_head *compound_pagelist) { + unsigned long end =3D address + HPAGE_PMD_SIZE; struct folio *src, *tmp; - pte_t *_pte; pte_t pteval; + pte_t *_pte; + unsigned int nr_ptes; =20 - for (_pte =3D pte; _pte < pte + HPAGE_PMD_NR; - _pte++, address +=3D PAGE_SIZE) { + for (_pte =3D pte; _pte < pte + HPAGE_PMD_NR; _pte +=3D nr_ptes, + address +=3D nr_ptes * PAGE_SIZE) { + nr_ptes =3D 1; pteval =3D ptep_get(_pte); if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) { add_mm_counter(vma->vm_mm, MM_ANONPAGES, 1); @@ -722,18 +725,26 @@ static void __collapse_huge_page_copy_succeeded(pte_t= *pte, struct page *src_page =3D pte_page(pteval); =20 src =3D page_folio(src_page); - if (!folio_test_large(src)) + + if (folio_test_large(src)) { + unsigned int max_nr_ptes =3D (end - address) >> PAGE_SHIFT; + + nr_ptes =3D folio_pte_batch(src, _pte, pteval, max_nr_ptes); + } else { release_pte_folio(src); + } + /* * ptl mostly unnecessary, but preempt has to * be disabled to update the per-cpu stats * inside folio_remove_rmap_pte(). */ spin_lock(ptl); - ptep_clear(vma->vm_mm, address, _pte); - folio_remove_rmap_pte(src, src_page, vma); + clear_ptes(vma->vm_mm, address, _pte, nr_ptes); + folio_remove_rmap_ptes(src, src_page, nr_ptes, vma); spin_unlock(ptl); - free_folio_and_swap_cache(src); + free_swap_cache(src); + folio_put_refs(src, nr_ptes); } } =20 --=20 2.30.2 From nobody Mon Oct 6 06:36:42 2025 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id C8D231F5858 for ; Thu, 24 Jul 2025 05:23:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753334605; cv=none; b=nG8CBwL0acSrb6nbhFyMAuGbIQ39NV3l8Xz0dCbyQvi57DFQ55nl+6NlcVdGEDauhwvwnm9ulDK7e6304ELHhXgWIuVpQ1LyPBsHOBmlfTovN7egt8HW9u4wUgBVfWcwpKDSb0Yw4PkPWKut26FV3ReaiVqCDPwD0rBEnArOkmQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1753334605; c=relaxed/simple; bh=dQNdODB7MyOXZDwY7hOhj6RuDPpoZVtEOGPfSbKamw8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=bbSPtZvYlNOLmE/Una1MN0eYB0llPXM2IXK2yFn4mDH8G/pU8wWwy4/sEhzsvIXKpc8pQnoDA3cvxvqmOCTX3DrfGq5Lp4NWTpeCZ3qEqW/NDjAKIn2ODzHNOkVfp2uPrLbVujGmGbEIwJlWsZ32l527r8yuZGSWWfvlS6jDKlE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id B86AF1A32; Wed, 23 Jul 2025 22:23:16 -0700 (PDT) Received: from MacBook-Pro.blr.arm.com (unknown [10.164.18.48]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 5B6FB3F66E; Wed, 23 Jul 2025 22:23:19 -0700 (PDT) From: Dev Jain To: akpm@linux-foundation.org, david@redhat.com Cc: ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, baohua@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Dev Jain Subject: [PATCH v4 3/3] khugepaged: Optimize collapse_pte_mapped_thp() by PTE batching Date: Thu, 24 Jul 2025 10:53:01 +0530 Message-Id: <20250724052301.23844-4-dev.jain@arm.com> X-Mailer: git-send-email 2.39.5 (Apple Git-154) In-Reply-To: <20250724052301.23844-1-dev.jain@arm.com> References: <20250724052301.23844-1-dev.jain@arm.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Use PTE batching to batch process PTEs mapping the same large folio. An improvement is expected due to batching mapcount manipulation on the folios, and for arm64 which supports contig mappings, the number of TLB flushes is also reduced. Note that we do not need to make a change to the check "if (folio_page(folio, i) !=3D page)"; if i'th page of the folio is equal to the first page of our batch, then i + 1, .... i + nr_batch_ptes - 1 pages of the folio will be equal to the corresponding pages of our batch mapping consecutive pages. Acked-by: David Hildenbrand Reviewed-by: Baolin Wang Signed-off-by: Dev Jain Reviewed-by: Lorenzo Stoakes Reviewed-by: Zi Yan --- mm/khugepaged.c | 33 +++++++++++++++++++++------------ 1 file changed, 21 insertions(+), 12 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index f23e943506bc..374a6a5193a7 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1503,15 +1503,17 @@ static int set_huge_pmd(struct vm_area_struct *vma,= unsigned long addr, int collapse_pte_mapped_thp(struct mm_struct *mm, unsigned long addr, bool install_pmd) { + int nr_mapped_ptes =3D 0, result =3D SCAN_FAIL; + unsigned int nr_batch_ptes; struct mmu_notifier_range range; bool notified =3D false; unsigned long haddr =3D addr & HPAGE_PMD_MASK; + unsigned long end =3D haddr + HPAGE_PMD_SIZE; struct vm_area_struct *vma =3D vma_lookup(mm, haddr); struct folio *folio; pte_t *start_pte, *pte; pmd_t *pmd, pgt_pmd; spinlock_t *pml =3D NULL, *ptl; - int nr_ptes =3D 0, result =3D SCAN_FAIL; int i; =20 mmap_assert_locked(mm); @@ -1625,11 +1627,15 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, u= nsigned long addr, goto abort; =20 /* step 2: clear page table and adjust rmap */ - for (i =3D 0, addr =3D haddr, pte =3D start_pte; - i < HPAGE_PMD_NR; i++, addr +=3D PAGE_SIZE, pte++) { + for (i =3D 0, addr =3D haddr, pte =3D start_pte; i < HPAGE_PMD_NR; + i +=3D nr_batch_ptes, addr +=3D nr_batch_ptes * PAGE_SIZE, + pte +=3D nr_batch_ptes) { + unsigned int max_nr_batch_ptes =3D (end - addr) >> PAGE_SHIFT; struct page *page; pte_t ptent =3D ptep_get(pte); =20 + nr_batch_ptes =3D 1; + if (pte_none(ptent)) continue; /* @@ -1643,26 +1649,29 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, u= nsigned long addr, goto abort; } page =3D vm_normal_page(vma, addr, ptent); + if (folio_page(folio, i) !=3D page) goto abort; =20 + nr_batch_ptes =3D folio_pte_batch(folio, pte, ptent, max_nr_batch_ptes); + /* * Must clear entry, or a racing truncate may re-remove it. * TLB flush can be left until pmdp_collapse_flush() does it. * PTE dirty? Shmem page is already dirty; file is read-only. */ - ptep_clear(mm, addr, pte); - folio_remove_rmap_pte(folio, page, vma); - nr_ptes++; + clear_ptes(mm, addr, pte, nr_batch_ptes); + folio_remove_rmap_ptes(folio, page, nr_batch_ptes, vma); + nr_mapped_ptes +=3D nr_batch_ptes; } =20 if (!pml) spin_unlock(ptl); =20 /* step 3: set proper refcount and mm_counters. */ - if (nr_ptes) { - folio_ref_sub(folio, nr_ptes); - add_mm_counter(mm, mm_counter_file(folio), -nr_ptes); + if (nr_mapped_ptes) { + folio_ref_sub(folio, nr_mapped_ptes); + add_mm_counter(mm, mm_counter_file(folio), -nr_mapped_ptes); } =20 /* step 4: remove empty page table */ @@ -1695,10 +1704,10 @@ int collapse_pte_mapped_thp(struct mm_struct *mm, u= nsigned long addr, : SCAN_SUCCEED; goto drop_folio; abort: - if (nr_ptes) { + if (nr_mapped_ptes) { flush_tlb_mm(mm); - folio_ref_sub(folio, nr_ptes); - add_mm_counter(mm, mm_counter_file(folio), -nr_ptes); + folio_ref_sub(folio, nr_mapped_ptes); + add_mm_counter(mm, mm_counter_file(folio), -nr_mapped_ptes); } unlock: if (start_pte) --=20 2.30.2